The Creation and Complexity Analysis of a Corpus of Educational Materials in Irish (EduGA)
Citation:Ó MEACHAIR, MÍCHEÁL JOHN, The Creation and Complexity Analysis of a Corpus of Educational Materials in Irish (EduGA), n/a, Trinity College Dublin.School of Linguistic Speech & Comm Sci, 2020
Abstract This research presents the construction of a 7.5-million word corpus of educational materials for teaching Irish and for teaching other subjects through the medium of Irish. This corpus is called EduGA. The corpus was compiled with a view to developing three complexity metrics for Irish, each of which is based on objective analyses that have been conducted on EduGA. The first analysis focuses on 7 lexico-grammatical language features that have been prescribed for multiple Irish courses. The statistical significance of the each lexico-grammatical feature was then analysed at the level for which it was first prescribed as well as other levels. It was concluded that only some language features occur at a statistically significant frequency at the level for which they were first prescribed, that less than half of the language features analysed could be reliably used to analyse lexical complexity. The second analysis tested the applicability of length-of-word and length-of-sentence analyses to Irish complexity studies. These metrics were chosen because they are commonly used in other languages. Length of word was not found to be a suitable metric for Irish because of the language's morphology. Length of sentence was found to be a suitable for analysing syntactic complexity in Irish texts. In the final analysis a term-based metric was developed in order to analyse semantic complexity. This term-based combines two dimensions in order to contextualise the term usage. The first dimension draws on the topicality of terms in documents, and the second dimension draws on the frequency of terms in a corpus of general Irish. A by-product of the present research was the development of resources, namely, a corpus of Irish-language Educational materials. Wordlists for each subject-specific sub-corpus were included in the appendix to this thesis in order to provide for further research in this area.
An Chomhairle um Oideachas Gaeltachta agus Gealscola?ochta
Author: Ó MEACHAIR, MÍCHEÁL JOHN
Advisor:Ui Dhonnchadha, Elaine
Publisher:Trinity College Dublin. School of Linguistic Speech & Comm Sci. C.L.C.S.
Type of material:Thesis
Availability:Full text available