Better statistical estimation can benefit all phrases in phrase-based statistical machine translation

Authors
Publication date 2008
Book title SLT 2008: 2008 IEEE Workshop on Spoken Language Technology: Proceedings
ISBN
  • 9781424434718
Event 2008 IEEE Workshop on Spoken Language Technology (SLT 2008), Goa, India
Pages (from-to) 237-240
Publisher IEEE
Organisations
  • Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract The heuristic estimates of conditional phrase translation probabilities are based on frequency counts in a word-aligned parallel corpus. Earlier attempts at more principled estimation using Expectation-Maximization (EM) underperform this heuristic. This paper shows that a recently introduced novel estimator based on smoothing might provide a good alternative. When all phrase pairs are estimated (no length cut-off), this estimator slightly outperforms the heuristic estimator.
Document type Conference contribution
Published at https://doi.org/10.1109/SLT.2008.4777884
Permalink to this page
Back