Using Character N-Grams to Explore Diachronic Change in Medieval English
Citation:
Buckley, Kevin and Carl Vogel, Using Character N-Grams to Explore Diachronic Change in Medieval English, Folia Linguistica Historica, 40, 2, 2019, 249 - 300Download Item:

Abstract:
This paper applies character N-grams to the study of diachronic linguistic variation in a historical language. The period selected for this initial exploratory study is medieval English, a well-studied period of great linguistic variation and language contact, whereby the efficacy of computational techniques can be examined through comparison to the wealth of thorough scholarship on medieval linguistic variation. Frequency profiles of character N-gram features were generated for several epochs in the history of English and a measure of language distance was employed to quantify the similarity between English at different stages in its history. Through this a quantification of internal change in English was achieved. Furthermore similarity between English and other medieval languages across time was measured allowing for a measurement of the well-known period of contact between English and Anglo-Norman French. This methodology is compared to traditional lexicostatistical methods and shown to be able to derive the same patterns as those derived from expert-created feature lists (i.e. Swadesh lists). The use of character N-gram profiles proved to be a flexible and useful method to study diachronic variation, allowing for the highlighting of relevant features of change. This method may be a complement to traditional qualitative examinations.
Sponsor
Grant Number
Science Foundation Ireland (SFI)
13/RC/2106
Science Foundation Ireland (SFI)
12/CE/I2267
Author's Homepage:
http://people.tcd.ie/vogelDescription:
PUBLISHED
Author: Vogel, Carl
Note:
The final publication is available at www.de-gruyter.comType of material:
Journal ArticleCollections:
Series/Report no:
Folia Linguistica Historica;40;
2;
Availability:
Full text availableKeywords:
Computational linguistics, History of English, Language contact, Diachronic linguistics, Character N-gramsSubject (TCD):
Digital Engagement , Digital Humanities , Computational Linguistics , Computational linguistics , Corpus Linguistics , computational linguistics , computational stylistics , stylisticsDOI:
https://doi.org/10.1515/flih-2019-0012ISSN:
0165-4004Licences: