Publishing Social Sciences Datasets as Linked Data: a Political Violence Case Study
Citation:Rob Brennan, Kevin Feeney, Odhran Gavin, Publishing Social Sciences Datasets as Linked Data: a Political Violence Case Study, ENRICH 2013 at SIGIR 2013, Dublin, 1 August 2013, Seamus Lawless, Maristella Agosti, Paul Clough, Owen Conlon, 2013
sigir-enrich-2013-dacura-04.pdf (Accepted for publication (author's copy) - Peer Reviewed) 262.6Kb
This paper discusses the design, application and generalisation of a Linked Data vocabulary to describe historical events of political violence. The vocabulary was designed to capture the United States political violence 1795- 2010 dataset created by Prof. Peter Turchin in the course of his social science research into Cliodynamics. The vocabulary has been generalized to support a semi-automated data collection process suitable for the creation of a complimentary dataset of political violence events in the UK and Ireland. Both datasets will be published as managed linked data that is inter-connected with other web-based datasets such as DBpedia, a computer-readable version of Wikipedia. The lifecycle of the datasets will be actively managed with tool support for further harvesting, evolution and consistency checking. The creation of the political violence vocabulary required the evaluation of re-existing vocabularies for potential reuse and compatibility. The original US political violence dataset was stored in a spreadsheet and an initial vocabulary was extracted from that. A process was elaborated for the semi-automated harvesting of political violence data from online corpora of historical documents such as a newspaper archive. The vocabulary was refined to support dynamic interface generation by a vocabulary-neutral data harvesting tool prototype. The harvesting tool, data harvesting process, political violence vocabulary and US political violence dataset were connected to our existing linked data management platform, DaCura, This work has produced a general political violence vocabulary that has been validated by application to a real-world dataset and publication use-cases. Our data harvesting process is potentially applicable to a wide range of social science or historical research activities that focus on generating structured data-sets or annotations of human-readable corpora. The publication of the US political violence dataset as linked data is a contribution towards the emerging fields of Digital Humanities and Science. The main practical outcome of this work to date is a prototype political violence data harvesting tool-chain that will enable us to quickly collect the UK and Ireland political violence dataset and perform experimental evaluations on this collection process to gather evidence about the effectiveness of our approach and to further refine the approach towards increased productivity and user satisfaction for social science researchers engaged in the data collection. The key benefits of reading this paper are a description of a new linked data vocabulary for political violence events, insights into the processes of creating a new vocabulary for social science datasets and an illustration of the potential benefits of publishing social science or other cultural heritage datasets as linked data.
Other Titles:ENRICH 2013 at SIGIR 2013
Type of material:Poster
Availability:Full text available