Detecting restriction class correspondences in Linked Open Data
Citation:
Brian Walshe, 'Detecting restriction class correspondences in Linked Open Data', [thesis], Trinity College (Dublin, Ireland). School of Computer Science & Statistics, 2014, pp 154Download Item:
Abstract:
The Linked Open Data (LOD) project has made a broad range of knowledge available on the World Wide Web as open datasets, using a common format and linked in such a manner that it is a simple task to explore and integrate data from different sources. The links between the datasets are one to one relationships between named URIs, but these simple links are not always sufficient to describe many relationships between data in the sets. Sometimes it is more appropriate to use a complex correspondence to describe the relationship –a correspondence which involves one or more entities in a logical formulation. This thesis
discusses an approach to detecting complex correspondences between ontologies associated with LOD. There are several challenges associated with this task. First, the datasets can be very large and the number of potential complex correspondences is a combinatorial function of their size. Secondly, many of the datasets in the LOD cloud may have large amounts of missing information or invalid entries. Our approach focuses on detecting correspondences between named classes and logically constructed restriction classes. An extensional approach– that is, one which uses instance data shared by the datasets –is employed to discover appropriate restriction classes for the correspondences. Unlike other extensional approaches, ours focuses on using methods which will perform well with only a small number of example instances– as finding matched instances is not usually a trivial task. To evaluate the approach we first demonstrate that it can be used to detect complex correspondences between the DBpedia and YAGO2 datasets. We then show that a small sample of 15 matched instances can be as
effective as using a sample in the order of 10 4 instances. Further to this we show that as the level of missing data increases a selection metric which takes the open world assumption into account can consistently outperform the Information Gain metric, popular in the field of machine learning. Finally we compare our
approach to a leading extensional approach to detecting complex correspondences, and show that our approach can produce more accurate correspondences, while using significantly less input data. The primary contribution of this thesis is a robust, scalable approach to detecting complex correspondences between classes in ontologies. LOD ontologies were the primary motivation behind this work, but the approach could be applied to any ontologies which contain individuals which can
be mapped directly to one another with equivalence relationships.
Author: Walshe, Brian
Advisor:
O'Sullivan, DeclanQualification name:
Doctor of Philosophy (Ph.D.)Publisher:
Trinity College (Dublin, Ireland). School of Computer Science & StatisticsNote:
TARA (Trinity's Access to Research Archive) has a robust takedown policy. Please contact us if you have any concerns: rssadmin@tcd.ieType of material:
thesisCollections
Availability:
Full text availableMetadata
Show full item recordLicences: