The Effect of Language Diversity When Fine-Tuning Large Language Models for Translation

D. Stap; C. Monz

doi:https://doi.org/10.18653/v1/2025.findings-emnlp.224

The Effect of Language Diversity When Fine-Tuning Large Language Models for Translation

Authors	D. Stap C. Monz
Publication date	2025
Host editors	C. Christodoulopoulos T. Chakraborty C. Rose V. Peng
Book title	The 2025 Conference on Empirical Methods in Natural Language Processing : Findings of EMNLP 2025
Book subtitle	EMNLP 2025 : November 4-9, 2025
ISBN (electronic)	9798891763357
Event	30th Conference on Empirical Methods in Natural Language Processing, EMNLP 2025
Pages (from-to)	4199-4211
Publisher	Kerrville, TX: Association for Computational Linguistics
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	Prior research diverges on language diversity in LLM fine-tuning: Some studies report benefits while others find no advantages. Through controlled fine-tuning experiments across 132 translation directions, we systematically resolve these disparities. We find that expanding language diversity during fine-tuning improves translation quality for both unsupervised and—surprisingly—supervised pairs, despite less diverse models being fine-tuned exclusively on these supervised pairs. However, benefits plateau or decrease beyond a certain diversity threshold. We show that increased language diversity creates more language-agnostic representations. These representational adaptations help explain the improved performance in models fine-tuned with greater diversity.
Document type	Conference contribution
Note	With checklist
Language	English
Published at	https://doi.org/10.18653/v1/2025.findings-emnlp.224
Downloads	2025.findings-emnlp.224 (Final published version)
Supplementary materials	2025.findings-emnlp.224.checklist
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

The Effect of Language Diversity When Fine-Tuning Large Language Models for Translation