| Author||Andrea Schuch|
|Title||EBMT Based upon Two-Dimensional Alignment|
|Supervisors||Remko Scha, Anouk Perquin|
|Faculty||Faculty of Science|
|Programme||FNWI MSc Artificial Intelligence|
|Abstract||Alignment models have been more prominent in the statistical (SMT) rather than the example-based machine translation (EBMT) research tradition. Word alignment is one of the oldest concepts in machine translation, and it is now readily available thanks to statistical tools. However, since real translations are rarely word-by-word, word alignment usually makes use of two linguistically unreasonable concepts: empty cepts and distortion. In this thesis, we develop a two-phase EBMT approach on the basis of a two-dimensional word alignment model. This approach avoids alignment to the empty|
cept and does not make use of distortion.
In two-phase (or precompiled) EBMT, translation examples are converted into translation rules during the preprocessing phase. However, since the sentence to be translated is not known at this stage, preprocessing must be sensitive, as it entails a great risk of
losing valuable information. In order to enable a more informed matching, translation rules must remain representative of the original example translation. Such a translation rule we call translation frame, and its major task is to capture the structural discrepancies in the sentence pair. We propose to generate translation frames on the basis of a two-dimensional alignment model. In addition to the usual word translations - which we call inter-sentential dependencies - this alignment model contains intra-sentential dependencies, which model relations between words within the source and target sentences. In addition to the usual direct alignment between two words, this also enables indirect alignment of untranslatable words via an intra-sentential dependency to a directly aligned word. Discontiguous phrases are thus aligned in two steps: First, we align their translatable words, and then we associate their untranslatable words. Our translation frame generation method includes all words in the translation frame that take part in indirect alignment.
We implement a prototype system, which only relies on resources that are very easy to obtain: intra-sentential dependencies are computed from correlations between words, and inter-sentential dependencies are established by using single-word translations. We detailedly show how the prototype can deal with a real-corpus example, and compare its
performance to two EBMT approaches of comparable simplicity (runtime and compiled). While in these experiments the protype's performance is at the same level as the baseline system's, our approach has the following advantages: The two-dimensional alignment model can exclusively rely on relationships between single words, in particular single-word translations, which are easier obtained than phrase translations. At the same time, it does not enforce translatability for every word in the sentence, but offers a treatment for untranslatable words that does not align them to the empty cept. Thirdly, intra- and inter-sentential dependencies can be computed independently of one another. Finally, alignments can be computed without distortion models or restrictions on word order, because EBMT needs them exclusively for the analysis of existing translation examples, not for generating new translations.
|Document type|| scriptie master|
Use this url to link to this page: http://dare.uva.nl/en/scriptie/356448
Contact us about this recordNotify a colleague
Add to bookbag