Investigating Connectivity and Consistency Criteria for Phrase Pair Extraction in Statistical Machine Translation

Authors
Publication date 2013
Host editors
  • A. Kornai
  • M. Kuhlmann
Book title MoL 13: the 13th Meeting on the Mathematics of Language: proceedings: August 9, 2013, Sofia, Bulgaria
ISBN
  • 9781937284657
Event Meeting on the Mathematics of Language (MoL 13)
Pages (from-to) 93-101
Publisher Stroudsburg, PA: Association for Computational Linguistics
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
The consistency method has been established as the standard strategy for extracting high quality translation rules in statistical machine translation (SMT). However, no attention has been drawn to why this method is successful, other than empirical evidence. Using concepts from graph theory, we identify the relation between consistency and components of graphs that represent word-aligned sentence pairs. It can be shown that phrase pairs of interest to SMT form a sigma-algebra generated
by components of such graphs. This construction is generalized by allowing segmented sentence pairs, which in turn gives rise to a phrase-based generative model. A by-product of this model is a derivation of probability mass functions for random partitions. These are realized as cases of constrained, biased sampling without replacement and we provide an exact formula for the probability of a segmentation of a sentence.
Document type Conference contribution
Language English
Published at http://aclweb.org/anthology/W/W13/W13-3010.pdf
Permalink to this page
Back