- Typologically robust statistical machine translation
- Understanding and exploiting differences and similarities between languages in machine translation
- Award date
- 20 March 2018
- Number of pages
- Document type
- PhD thesis
- Interfacultary Research Institutes
Faculty of Science (FNWI)
- Institute for Logic, Language and Computation (ILLC)
Machine translation systems often incorporate modeling assumptions motivated by properties of the language pairs they initially target. When such systems are applied to language families with considerably different properties, translation quality can deteriorate. Phrase-based machine translation systems, for instance, are ill-equipped to handle the challenges caused by relaxed word order constraints and productive word formation processes in morphologically rich languages. In this thesis, we ask what role the properties of languages, as studied in the field of linguistic typology, play in how well machine translation systems perform. We focus in particular on word order and morphology, and show that typological differences in these areas can be bridged by making certain linguistic phenomena overt to the translation system. Understanding and exploiting typological differences between languages enables improvements to the typological robustness of translation systems without significantly changing the assumptions of the underlying translation models. In the area of word order, we examine the influence of word order freedom on preordering, a popular technique to model word order in phrase-based machine translation, and propose a method to improve its typological robustness. For morphological complexity, we show that reducing the dissimilarity between the source and target language improves phrase-based machine translation for typologically diverse language pairs. Finally, we show that besides helping to bridge the performance gaps between typologically diverse languages, linguistic typology can also serve as a source of knowledge to guide reordering models and to facilitate universal reordering models applicable to multiple target languages.
- ILLC dissertation series DS-2018-05
If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library, or send a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible.