UvA-MT’s Participation in the WMT24 General Translation Shared Task

Open Access
Authors
Publication date 2024
Host editors
  • B. Haddow
  • T. Kocmi
  • P. Koehn
  • C. Monz
Book title Ninth Conference on Machine Translation : Proceedings of the Conference
Book subtitle WMT 2024 : November 15-16, 2024
ISBN (electronic)
  • 9798891761797
Event 9th Conference on Machine Translation
Pages (from-to) 176-184
Number of pages 9
Publisher Kerrville, TX: Association for Computational Linguistics
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
Fine-tuning Large Language Models (FT-LLMs) with parallel data has emerged as a promising paradigm in recent machine translation research. In this paper, we explore the effectiveness of FT-LLMs and compare them to traditional encoder-decoder Neural Machine Translation (NMT) systems under the WMT24 general MT shared task for English to Chinese direction. We implement several techniques, including Quality Estimation (QE) data filtering, supervised fine-tuning, and post-editing that integrate NMT systems with LLMs. We demonstrate that fine-tuning LLaMA2 on a high-quality but relatively small bitext dataset (100K) yields COMET results comparable to much smaller encoder-decoder NMT systems trained on over 22 million bitexts. However, this approach largely underperforms on surface-level metrics like BLEU and ChrF. We further control the data quality using the COMET-based quality estimation method. Our experiments show that 1) filtering low COMET scores largely improves encoder-decoder systems, but 2) no clear gains are observed for LLMs when further refining the fine-tuning set. Finally, we show that combining NMT systems with LLMs via post-editing generally yields the best performance for the WMT24 official test set.
Document type Conference contribution
Language English
Related publication UvA-MT’s Participation in the WMT 2023 General Translation Shared Task
Published at https://doi.org/10.18653/v1/2024.wmt-1.11
Downloads
2024.wmt-1.11 (Final published version)
Permalink to this page
Back