The Effect of Language Diversity When Fine-Tuning Large Language Models for Translation

Open Access
Authors
Publication date 2025
Host editors
  • C. Christodoulopoulos
  • T. Chakraborty
  • C. Rose
  • V. Peng
Book title The 2025 Conference on Empirical Methods in Natural Language Processing : Findings of EMNLP 2025
Book subtitle EMNLP 2025 : November 4-9, 2025
ISBN (electronic)
  • 9798891763357
Event 30th Conference on Empirical Methods in Natural Language Processing, EMNLP 2025
Pages (from-to) 4199-4211
Publisher Kerrville, TX: Association for Computational Linguistics
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
Prior research diverges on language diversity in LLM fine-tuning: Some studies report benefits while others find no advantages. Through controlled fine-tuning experiments across 132 translation directions, we systematically resolve these disparities. We find that expanding language diversity during fine-tuning improves translation quality for both unsupervised and—surprisingly—supervised pairs, despite less diverse models being fine-tuned exclusively on these supervised pairs. However, benefits plateau or decrease beyond a certain diversity threshold. We show that increased language diversity creates more language-agnostic representations. These representational adaptations help explain the improved performance in models fine-tuned with greater diversity.
Document type Conference contribution
Note With checklist
Language English
Published at https://doi.org/10.18653/v1/2025.findings-emnlp.224
Downloads
2025.findings-emnlp.224 (Final published version)
Supplementary materials
Permalink to this page
Back