An Empirical Analysis of Machine Translation for Expanding Multilingual Benchmarks

S. Rajaee; R. Choenni; E. Shutova; C. Monz

doi:https://doi.org/10.18653/v1/2025.wmt-1.1

An Empirical Analysis of Machine Translation for Expanding Multilingual Benchmarks

Authors	S. Rajaee R. Choenni E. Shutova C. Monz
Publication date	2025
Host editors	B. Haddow T. Kocmi P. Koehn C. Monz
Book title	Tenth Conference on Machine Translation : Proceedings of the Conference
Book subtitle	WMT 2025 : November 8-9, 2025
ISBN (electronic)	9798891763418
Event	10th Conference on Machine Translation, WMT 2025
Pages (from-to)	1-30
Publisher	Kerrville, TX: Association for Computational Linguistics
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI) Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract	The rapid advancement of large language models (LLMs) has introduced new challenges in their evaluation, particularly for multilingual settings. The limited evaluation data are more pronounced in low-resource languages due to the scarcity of professional annotators, hindering fair progress across languages. In this work, we systematically investigate the viability of using machine translation (MT) as a proxy for evaluation in scenarios where human-annotated test sets are unavailable. Leveraging a state-of-the-art translation model, we translate datasets from four tasks into 198 languages and employ these translations to assess the quality and robustness of MT-based multilingual evaluation under different setups. We analyze task-specific error patterns, identifying when MT-based evaluation is reliable and when it produces misleading results. Our translated benchmark reveals that current language selections in multilingual datasets tend to overestimate LLM performance on low-resource languages. We conclude that although machine translation is not yet a fully reliable method for evaluating multilingual models, overlooking its potential means missing a valuable opportunity to track progress in non-English languages.
Document type	Conference contribution
Language	English
Published at	https://doi.org/10.18653/v1/2025.wmt-1.1
Downloads	2025.wmt-1.1 (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

An Empirical Analysis of Machine Translation for Expanding Multilingual Benchmarks