Show simple item record

dc.contributor.advisorLawless, Seamusen
dc.contributor.authorMUNNELLY, GARYen
dc.date.accessioned2020-04-06T09:35:28Z
dc.date.available2020-04-06T09:35:28Z
dc.date.issued2020en
dc.date.submitted2020en
dc.identifier.citationMUNNELLY, GARY, Entity linking for text based cultural heritage collections, Trinity College Dublin.School of Computer Science & Statistics, 2020en
dc.identifier.otherYen
dc.identifier.urihttp://hdl.handle.net/2262/92189
dc.descriptionAPPROVEDen
dc.description.abstractThe ongoing digitisation of cultural heritage data and subsequent publication of that data in digital format has completely changed the manner in which people investigate and engage with cultural treasures. This change has propagated from the interested user browsing the web to the expert scholar in a research setting, applying digital tools in their pursuit of answers to questions. As the rate at which content can be digitised increases and the scale of collections grows, there is an implicit need to provide accurate, automated methods of organising and structuring this content in meaningful ways. The research presented in this thesis represents an investigation into the applications of Entity Linking techniques to the content of cultural heritage collections. A specific challenge faced in the context of this research is the specialised domain knowledge required on the part of the reader who must interpret these collections. It is here that contemporary sources of knowledge for the Entity Linking process are found to be lacking. Indeed, finding any individual source of information that can be used to adequately annotate this type of content is difficult. The challenge of identifying references to obscure entities is compounded by the extremely noisy nature of the content of these texts. An investigation is performed into the state of existing Entity Linking solutions in order to identify approaches which may be robust to the challenges presented by this content type. Evaluations are run to test the efficacy of off-the-shelf Entity Linking solutions. This investigation demonstrates the severe difficulty faced by typical Entity Linking tools when dealing with this content type. An interesting approach is identified which leverages multiple knowledge bases in order to annotate literary content. However this multi-knowledge base approach is limited in the context of the challenges faced by this thesis due to the manner in which it uses these multiple sources. In order to remedy problems with available knowledge that can inform an Entity Linking system, efforts are made to identify sources of knowledge which are not yet amenable to Entity Linking, but may prove to be helpful if they can be structured appropriately. Sources are identified both in the form of primary source and secondary source content. The secondary source content is structured into two ontologies which are subsequently linked back to DBpedia for the purposes both of leveraging its information in a multiple knowledge base Entity Linking solution and to facilitate integration between collections annotated with the new ontology, and those annotated with DBpedia. This linking process is performed automatically using a novel linking method. Finally an approach to performing Entity Linking which combines multiple knowledge base sources is presented. A novel approach to constructing the knowledge base is presented. This approach facilitates both the use of and control over the multiple knowledge bases that inform the entity linker. It is demonstrated that this new system performs better than other tested systems when applied to various Entity Linking problems.en
dc.publisherTrinity College Dublin. School of Computer Science & Statistics. Discipline of Computer Scienceen
dc.rightsYen
dc.titleEntity linking for text based cultural heritage collectionsen
dc.typeThesisen
dc.type.supercollectionthesis_dissertationsen
dc.type.supercollectionrefereed_publicationsen
dc.type.qualificationlevelDoctoralen
dc.identifier.peoplefinderurlhttps://tcdlocalportal.tcd.ie/pls/EnterApex/f?p=800:71:0::::P71_USERNAME:MUNNELLGen
dc.identifier.rssinternalid215324en
dc.rights.ecaccessrightsopenAccess
dc.contributor.sponsorTrinity College Dublin (TCD)en


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record