Towards a Better Understanding of Variations in Zero-Shot Neural Machine Translation Performance

Open Access
Authors
Publication date 2023
Host editors
  • H. Bouamor
  • J. Pino
  • K. Bali
Book title The 2023 Conference on Empirical Methods in Natural Language Processing
Book subtitle EMNLP 2023 : Proceedings of the Conference : December 6-10, 2023
ISBN (electronic)
  • 9798891760608
Event 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023
Pages (from-to) 13553–13568
Publisher Stroudsburg, PA: Association for Computational Linguistics
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
Multilingual Neural Machine Translation (MNMT) facilitates knowledge sharing but often suffers from poor zero-shot (ZS) translation qualities. While prior work has explored the causes of overall low zero-shot translation qualities, our work introduces a fresh perspective: the presence of significant variations in zero-shot performance. This suggests that MNMT does not uniformly exhibit poor zero-shot capability; instead, certain translation directions yield reasonable results. Through systematic experimentation, spanning 1,560 language directions across 40 languages, we identify three key factors contributing to high variations in ZS NMT performance: 1) target-side translation quality, 2) vocabulary overlap, and 3) linguistic properties. Our findings highlight that the target side translation quality is the most influential factor, with vocabulary overlap consistently impacting zero-shot capabilities. Additionally, linguistic properties, such as language family and writing system, play a role, particularly with smaller models. Furthermore, we suggest that the off-target issue is a symptom of inadequate performance, emphasizing that zero-shot translation challenges extend beyond addressing the off-target problem. To support future research, we release the data and models as a benchmark for the study of ZS NMT.
Document type Conference contribution
Note With supplementary video
Language English
Related dataset ZS-NMT-Variations, EC40 Multilingual Machine Translation Dataset/Benchmark
Published at https://doi.org/10.18653/v1/2023.emnlp-main.836
Downloads
2023.emnlp-main.836 (Final published version)
Supplementary materials
Permalink to this page
Back