Maximizing Component Quality in Bilingual Word-Aligned Segmentations

Authors
Publication date 2014
Host editors
  • S. Wintner
  • S. Goldwater
  • S. Riezler
Book title EACL 2014: 14th Conference of the European Chapter of the Association for Computational Linguistics
Book subtitle proceedings of the conference: April 26-30, 2014, Gothenburg, Sweden
ISBN
  • 9781937284787
Event European Chapter of the Association for Computational Linguistics (EACL) 2014
Pages (from-to) 30-38
Publisher Stroudsburg, PA: Association for Computational Linguistics
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
Given a pair of source and target language sentences which are translations of each other with known word alignments between them, we extract bilingual phrase-level segmentations of such a pair. This is done by identifying two appropriate measures that assess the quality of phrase segments, one on the monolingual level for both language sides, and one on the bilingual level. The monolingual measure is based on the notion of partition refinements and the bilingual measure is based on structural properties of the graph that represents phrase segments and word alignments.
These two measures are incorporated in a basic adaptation of the Cross-Entropy method for the purpose of extracting an N-best list of bilingual phrase-level segmentations. A straight-forward application of such lists in Statistical Machine Translation (SMT) yields a conservative phrase pair extraction method that reduces phrase-table sizes by 90% with insignificant loss in translation quality.
Document type Conference contribution
Language English
Published at http://www.aclweb.org/anthology/E/E14/E14-1004.pdf
Permalink to this page
Back