Quality Improvement in the Mapping Process required for Linked Data publication
Citation:
Randles, Alex, Quality Improvement in the Mapping Process required for Linked Data publication, Trinity College Dublin, School of Computer Science & Statistics, Computer Science, 2023Download Item:
Alex_Randles-Thesis_Final.pdf (PDF) 9.576Mb
Abstract:
This thesis presents a quality improvement approach named the Mapping Quality Improvement (MQI) Framework designed to improve and maintain quality in the publication process involved in the creation of linked data.
Linked data is described as a set of best practices used for publishing and interlinking data on the web. A linked data dataset is structured information encoded using the Resource Description Framework (RDF) and provides information, which is interoperable, extensible, and machine-readable. Resources are identified by and linked with other datasets using HTTP URIs which enables the accessibility of resources through the HTTP protocol. Statements within RDF are referred to as triples (subject-predicate-object), which represent the nodes and edges within a data graph. Linked data is referred to as one of the most efficient and effective knowledge integration and discovery approach. The Semantic web represents the web of data which is an extension of the current web where data is stored in machine-readable and standardized formats such as the World Wide Web Consortium linked data recommendation. Transforming heterogeneous data into linked data representation is an essential prerequisite for evolving the semantic web and requires the definition of declarative uplift mappings. ?Uplift? mappings are used in the publication process in order to define rules for transforming data from non-RDF format to RDF format. However, defining these mappings is a complex and often error prone task, often resulting in propagation of quality issues into the resulting linked data dataset. In addition, linked data is highly dynamic in nature with frequent changes, often resulting in alignment issues between the linked data, the mappings and the underlying data sources. Oftentimes, the burden of quality assessment is on third parties after a linked data dataset has been published, greatly decreasing the trustworthiness of data on the semantic web.
A literature review was conducted in order to define requirements for the MQI framework, which was designed to resolve limitations discovered in the state of the art. The literature review consisted of two phases focused on approaches to support the creation and maintenance of declarative mappings used during the publication process of linked data. The review indicated a lack of approaches to support the relevant processes. Therefore, the MQI framework was designed to support users by providing a suite of quality metrics in order to identify diverse issues early in the linked data publication process, specifically the mapping stage. The framework guides the process of removing detected quality issues in the mappings by suggesting providing semi-automatic refinements. In addition, the framework supports preservation of alignment of the uplift mappings with underlying data sources by detecting source data changes after publication. Importantly, the relevant quality process information captured by the framework is itself represented as semantic data, which enables its association with resulting linked data datasets in order to improve downstream maintenance and reuse.
The proposed approach was evaluated through five experiments and one application study categorized by five aspects: accuracy, usability, understanding, effectiveness and validation. The application study was designed to evaluate the approach when applied in real world settings. The experiments involved over one hundred participants with varying background knowledge, including knowledge engineer students, uplift mapping specialists and ontology design specialists. The varying background knowledge aided retrieval of diverse insightful feedback. The accuracy of the framework was tested by detecting quality issues in real world mappings supplied by others from their projects. The first usability experiment tested the effectiveness of the MQI framework in supporting quality assessment and refinement of uplift mappings by users. The second usability experiment tested the understanding by users of changes detected by the framework in source data of mappings. Finally, ontology design specialist validated the ontologies that underpin the MQI framework processing. Overall the evaluations indicated that the MQI framework provides effective and understandable information to users, facilitates the creation and maintenance of high-quality uplift mappings with downstream impact on the quality of the resultant linked data dataset. Finally, the validation of both ontologies indicated that they are of sufficient design quality.
The research described in this thesis resulted in one major contribution and three minor contributions. The major contribution is the design and development of the MQI Framework. The first minor contribution is the Mapping Quality Improvement Ontology (MQIO) designed to represent mapping quality assessment, refinement and validation information. The second minor contribution is the Ontology for Source Change Detection (OSCD) designed to represent information about changes in source data of mappings and their alignment with associated mappings. Finally, the third minor contribution is the evaluation results from the 5 experiments and 1 application study conducted in order to evaluate the proposed approach.
Sponsor
Grant Number
Science Foundation Ireland (SFI)
Author's Homepage:
https://tcdlocalportal.tcd.ie/pls/EnterApex/f?p=800:71:0::::P71_USERNAME:RANDLESADescription:
APPROVED
Author: Randles, Alex
Advisor:
O'Sullivan, DeclanPublisher:
Trinity College Dublin. School of Computer Science & Statistics. Discipline of Computer ScienceType of material:
ThesisCollections:
Availability:
Full text availableLicences: