- Investigating Connectivity and Consistency Criteria for Phrase Pair Extraction in Statistical Machine Translation
- Meeting on the Mathematics of Language (MoL 13)
- Book/source title
- MoL 13: the 13th Meeting on the Mathematics of Language: proceedings: August 9, 2013, Sofia, Bulgaria
- Pages (from-to)
- Stroudsburg, PA: Association for Computational Linguistics
- Document type
- Conference contribution
- Faculty of Science (FNWI)
- Informatics Institute (IVI)
The consistency method has been established as the standard strategy for extracting high quality translation rules in statistical machine translation (SMT). However, no attention has been drawn to why this method is successful, other than empirical evidence. Using concepts from graph theory, we identify the relation between consistency and components of graphs that represent word-aligned sentence pairs. It can be shown that phrase pairs of interest to SMT form a sigma-algebra generated
by components of such graphs. This construction is generalized by allowing segmented sentence pairs, which in turn gives rise to a phrase-based generative model. A by-product of this model is a derivation of probability mass functions for random partitions. These are realized as cases of constrained, biased sampling without replacement and we provide an exact formula for the probability of a segmentation of a sentence.
If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library, or send a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible.