Investigating Connectivity and Consistency Criteria for Phrase Pair Extraction in Statistical Machine Translation
| Authors | |
|---|---|
| Publication date | 2013 |
| Host editors |
|
| Book title | MoL 13: the 13th Meeting on the Mathematics of Language: proceedings: August 9, 2013, Sofia, Bulgaria |
| ISBN |
|
| Event | Meeting on the Mathematics of Language (MoL 13) |
| Pages (from-to) | 93-101 |
| Publisher | Stroudsburg, PA: Association for Computational Linguistics |
| Organisations |
|
| Abstract |
The consistency method has been established as the standard strategy for extracting high quality translation rules in statistical machine translation (SMT). However, no attention has been drawn to why this method is successful, other than empirical evidence. Using concepts from graph theory, we identify the relation between consistency and components of graphs that represent word-aligned sentence pairs. It can be shown that phrase pairs of interest to SMT form a sigma-algebra generated
by components of such graphs. This construction is generalized by allowing segmented sentence pairs, which in turn gives rise to a phrase-based generative model. A by-product of this model is a derivation of probability mass functions for random partitions. These are realized as cases of constrained, biased sampling without replacement and we provide an exact formula for the probability of a segmentation of a sentence. |
| Document type | Conference contribution |
| Language | English |
| Published at | http://aclweb.org/anthology/W/W13/W13-3010.pdf |
| Permalink to this page | |