Class-Based Language Modeling for Translating into Morphologically Rich Languages
| Authors | |
|---|---|
| Publication date | 2014 |
| Host editors |
|
| Book title | COLING 2014: the 25th International Conference on Computational Linguistics |
| Book subtitle | proceedings of COLING 2014 : technical papers: August 23-29, 2014, Dublin, Ireland |
| ISBN |
|
| Event | COLING 2014 |
| Pages (from-to) | 1918-1927 |
| Publisher | Sroudsburg, PA: Association for Computational Linguistics |
| Organisations |
|
| Abstract |
Class-based language modeling (LM) is a long-studied and effective approach to overcome data sparsity in the context of n-gram model training. In statistical machine translation (SMT), differ- ent forms of class-based LMs have been shown to improve baseline translation quality when used in combination with standard word-level LMs but no published work has systematically com- pared different kinds of classes, model forms and LM combination methods in a unified SMT setting. This paper aims to fill these gaps by focusing on the challenging problem of translating into Russian, a language with rich inflectional morphology and complex agreement phenomena. We conduct our evaluation in a large-data scenario and report statistically significant BLEU im- provements of up to 0.6 points when using a refined variant of the class-based model originally proposed by Brown et al. (1992).
|
| Document type | Conference contribution |
| Language | English |
| Published at | http://www.aclweb.org/anthology/C14-1181 |
| Downloads |
C14-1181
(Final published version)
|
| Permalink to this page | |