Example-Based Machine Translation: An Adaptation-Guided Retrieval Approach
Download Item:
Abstract:
Translation can be viewed as a problem-solving process where a source language text is transformed into its target language equivalent. A machine translation system, solving the problem from first-principles, requires more knowledge than has ever been successfully encoded in any system. An alternative approach is to reuse past translation experience encoded in a set of exemplars, or cases.
A case which is similar to the input problem will be retrieved and a solution produced by adapting
its target language component. This thesis advances the state of the art in example-based machine translation by proposing techniques for predicting the adaptation requirements of a retrieval episode. An Adaptation-Guided Retrieval policy increases the efficiency of the retriever, which will now search for adaptable cases, and relieves the knowledge-acquisition bottleneck of the adaptation component. A flexible case-storage scheme also allows all knowledge required for adaptation to be deduced from the case-base itself.
The first part of the thesis contrasts such a CBR-motivated approach with current EBMT systems
which are either data-intensive or knowledge-intensive. A new EBMT scheme is proposed in which the cases encode knowledge about their own reusability, determined by cross-linguistic mappings.
The information allows cases to be generalised carefully, to the degree that is necessitated by the
data. Linguistic and translational divergences | the obstacles to reusability | are investigated in
the domain of software-manual translation, and on this basis, a suitable case representation scheme is proposed.
The second and third parts of the thesis describe the on-line and o_-line processes of an EBMT
system in which the case-base is the only knowledge source. Cases are deduced from texts automatically, and at run-time, the matching and retrieval tasks exploit the adaptability information in the cases in order to maximise coverage without compromising on accuracy. The multi-tiered case representation scheme allows adaptation at the sub-sentential and word levels, when necessary. The general performance of the system is shown to degrade gracefully and to improve as the case-base size increases.
Author: Collins, Brona
Advisor:
Cunningham, PadraigType of material:
DoctoralDoctor of Philosophy (Ph.D.)
Collections
Availability:
Full text availableMetadata
Show full item recordLicences: