- Maximizing Component Quality in Bilingual Word-Aligned Segmentations
- European Chapter of the Association for Computational Linguistics (EACL) 2014
- Book/source title
- EACL 2014: 14th Conference of the European Chapter of the Association for Computational Linguistics
- Book/source subtitle
- proceedings of the conference: April 26-30, 2014, Gothenburg, Sweden
- Pages (from-to)
- Stroudsburg, PA: Association for Computational Linguistics
- Document type
- Conference contribution
- Faculty of Science (FNWI)
- Informatics Institute (IVI)
Given a pair of source and target language sentences which are translations of each other with known word alignments between them, we extract bilingual phrase-level segmentations of such a pair. This is done by identifying two appropriate measures that assess the quality of phrase segments, one on the monolingual level for both language sides, and one on the bilingual level. The monolingual measure is based on the notion of partition refinements and the bilingual measure is based on structural properties of the graph that represents phrase segments and word alignments.
These two measures are incorporated in a basic adaptation of the Cross-Entropy method for the purpose of extracting an N-best list of bilingual phrase-level segmentations. A straight-forward application of such lists in Statistical Machine Translation (SMT) yields a conservative phrase pair extraction method that reduces phrase-table sizes by 90% with insignificant loss in translation quality.
- Final publisher version
If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library, or send a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible.