Statistically-constrained shallow text marking: techniques, evaluation paradigm and results
Citation:
Brian Murphy and Carl Vogel, Statistically-constrained shallow text marking: techniques, evaluation paradigm and results, Proceedings of SPIE - The International Society for Optical Engineering, Security, Steganography, and Watermarking of Multimedia Contents IX;, San Jose, California, February 2007, Edward J. Delp III, Ping Wah Wong, 6505, International Society for Optical Engineering, 2007, 65050ZDownload Item:
Statistically-constrained.pdf (published (author copy) peer-reviewed) 65.32Kb
Abstract:
We present three natural language marking strategies based on fast and reliable shallow parsing techniques, and on widely
available lexical resources: lexical substitution, adjective conjunction swaps, and relativiser switching. We test these
techniques on a random sample of the British National Corpus. Individual candidate marks are checked for goodness of
structural and semantic fit, using both lexical resources, and the web as a corpus. A representative sample of marks is given
to 25 human judges to evaluate for acceptability and preservation of meaning. This establishes a correlation between corpus
based felicity measures and perceived quality, and makes qualified predictions. Grammatical acceptability correlates with
our automatic measure strongly (Pearson?s r = 0.795, p = 0.001), allowing us to account for about two thirds of variability
in human judgements. A moderate but statistically insignificant (Pearson?s r = 0.422, p = 0.356) correlation is found with
judgements of meaning preservation, indicating that the contextual window of five content words used for our automatic
measure may need to be extended.
Author's Homepage:
http://people.tcd.ie/vogelDescription:
PUBLISHEDSan Jose, California
Author: VOGEL, CARL
Other Titles:
Proceedings of SPIE - The International Society for Optical EngineeringSecurity, Steganography, and Watermarking of Multimedia Contents IX;
Publisher:
International Society for Optical EngineeringType of material:
Conference PaperCollections:
Series/Report no:
6505Availability:
Full text availableSubject (TCD):
Ageing , Cancer , Creative Arts Practice , Creative Technologies , Digital Humanities , Inclusive Society , Intelligent Content & Communications , International Integration , Smart & Sustainable Planet , Telecommunications , Computational linguisticsDOI:
http://dx.doi.org/10.1117/12.713355Licences: