Linear transformations of semantic spaces for word-sense discrimination and collocation compositionality grading
Citation:Alfredo. Maldonado Guerra, 'Linear transformations of semantic spaces for word-sense discrimination and collocation compositionality grading', [thesis], Trinity College (Dublin, Ireland). School of Computer Science & Statistics, 2015, pp 184
Maldonado Guerra, Alfredo_TCD-SCSS-PHD-2015-02.pdf (PDF) 967.3Kb
Maldonado Guerra TCD THESIS 10507 Linear transformations.pdf (Scan of TCD Library print copy) 138.4Mb
Latent Semantic Analysis (LSA) and Word Space are two semantic models derived from the vector space model of distributional semantics that have been used successfully in word-sense disambiguation and discrimination. LSA can represent word types and word tokens in con- text by means of a single matrix factorised by Singular Value Decomposition (SVD). Word Space is able to represent types via word vectors and tokens through two separate kinds of context vectors: direct vectors that count first-order word co-occurrence and indirect vec- tors that capture second-order co-occurrence. Word Space objects are optionally reduced by SVD. Whilst being regarded as related, little has been discussed about the specific relation- ship between Word Space and LSA or the benefits of one model over the other, especially with regard to their capability of representing word tokens. This thesis aims to address this both theoretically and empirically. Within the theoretical focus, the definitions of Word Space and LSA as presented in the literature are studied. A formalisation of these two semantic models is presented and their theoretical properties and relationships are discussed. A fundamental insight from this theor- etical analysis is that indirect (second-order) vectors can be computed from direct (first-order) vectors through a linear transformation involving a matrix of word vectors (a word matrix), an operation that can itself be seen as a method of dimensionality reduction alternative to SVD. Another finding is that in their unreduced form, LSA vectors and the Word Space dir- ect (first-order) context vectors define approximately the same objects and their difference can be exactly calculated. It is also found that the SVD spaces produced by LSA and the Word Space word vectors are also similar and their difference, which can also be precisely calculated, ultimately stems from the original difference between unreduced LSA vectors and Word Space direct vectors. It is also observed that the indirect “second-order” method of token representation from Word Space is also available to LSA, in a version of the representa- tion that has remained largely unexplored. And given the analysis of the SVD spaces produced by both models, it is hypothesised that, when exploited in comparable ways, Word Space and LSA should perform similarly in actual word-sense disambiguation and discrimination exper- iments. In the empirical focus, performance comparisons between different configurations of LSA and Word Space are conducted in actual word-sense disambiguation and discrimination ex- periments. It is found that some indirect configurations of LSA and Word Space do indeed perform similarly, but other LSA and Word Space indirect configurations as well as their dir- ect representations perform more differently. So, whilst the two models define approximately the same spaces, their differences are large enough to impact performance. Word Space’s sim- pler, unreduced direct (first-order) context vectors are found to offer the best overall trade off between accuracy and computational expense. Another empirical exercise involves comparis- ons of geometric properties of Word Space’s two token vector representations aimed at testing their similarity and predicting their performance in means-based word-sense disambiguation and discrimination experiments. It is found that they are not geometrically similar and that sense vectors computed from direct vectors are more spread than those computed from indir- ect vectors. Word-sense disambiguation and discrimination experiments performed on these vectors largely reflect the geometric comparisons as the more spread direct vectors perform better than indirect vectors in supervised disambiguation experiments, although in unsuper- vised discrimination experiments, no clear winner emerges. The role of the Word Space word matrix as a dimensionality reduction operator is also explored. Instead of simply truncating the word matrix, a method in which dimensions representing statistically associated word pairs are summed and merged, called word matrix consolidation, is proposed. The method achieves modest but promising results comparable to SVD. Finally, the word vectors from Word Space are tested empirically in a task designed to grade (measure) the compositionality (or degree of “literalness”) of multi-word expressions (MWEs). Cosine similarity measures are taken between a word vector representing the full MWE, and word vectors represent- ing each of its individual member words in order to measure the deviation in co-occurrence distribution between the MWE and its individual members. It is found that this deviation in co-occurrence distributions does correlate with human compositionality judgements of MWEs.
Author: Maldonado Guerra, Alfredo.
Publisher:Trinity College (Dublin, Ireland). School of Computer Science & Statistics
Note:TARA (Trinity's Access to Research Archive) has a robust takedown policy. Please contact us if you have any concerns: firstname.lastname@example.org
Type of material:thesis
Availability:Full text available