- What’s in a domain?
- Towards fine-grained adaptation for machine translation
- Award date
- 8 December 2017
- Number of pages
- Document type
- PhD thesis
- Faculty of Science (FNWI)
- Informatics Institute (IVI)
Machine translation (MT) uses software to translate texts in one language to another language. Modern-day MT systems are built using large amounts of example translations between these two languages, so-called parallel corpora.
For many translation tasks, or domains, there are no sizable high-quality parallel corpora, and the resulting mismatch between the training data and the translation task can cause large drops in translation quality. In recent years, this problem has been addressed by adapting an MT system to the domain of interest to improve translation quality.
Unfortunately, the concept domain is poorly defined. Typically, domain is a hard-labeled concept that is directly used to optimize MT systems. To shed light on domains and their impact on MT, the core question in this thesis is: "What's in a domain?"
Guided by this question, we distinguish various aspects that together make up a domain, i.e., topic, genre, register, dialogue acts, speakers, and speaker gender. We study to what extent MT output differs among these aspects, and how we can use them to perform fine-grained adaptation for MT. We are particularly interested in informal and conversational genres, which lack standardization and are notorious for poor MT output. In addition, we aim to develop methods that do not, or at most partially, rely on manual domain information.
By studying what's in a domain and showing how we can use different aspects of language to improve MT, we take a step forward towards fine-grained adaptation for machine translation.
If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library, or send a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible.