The Effect of Language Diversity When Fine-Tuning Large Language Models for Translation
| Authors | |
|---|---|
| Publication date | 2025 |
| Host editors |
|
| Book title | The 2025 Conference on Empirical Methods in Natural Language Processing : Findings of EMNLP 2025 |
| Book subtitle | EMNLP 2025 : November 4-9, 2025 |
| ISBN (electronic) |
|
| Event | 30th Conference on Empirical Methods in Natural Language Processing, EMNLP 2025 |
| Pages (from-to) | 4199-4211 |
| Publisher | Kerrville, TX: Association for Computational Linguistics |
| Organisations |
|
| Abstract |
Prior research diverges on language diversity in LLM fine-tuning: Some studies report benefits while others find no advantages. Through controlled fine-tuning experiments across 132 translation directions, we systematically resolve these disparities. We find that expanding language diversity during fine-tuning improves translation quality for both unsupervised and—surprisingly—supervised pairs, despite less diverse models being fine-tuned exclusively on these supervised pairs. However, benefits plateau or decrease beyond a certain diversity threshold. We show that increased language diversity creates more language-agnostic representations. These representational adaptations help explain the improved performance in models fine-tuned with greater diversity.
|
| Document type | Conference contribution |
| Note | With checklist |
| Language | English |
| Published at | https://doi.org/10.18653/v1/2025.findings-emnlp.224 |
| Downloads |
2025.findings-emnlp.224
(Final published version)
|
| Supplementary materials | |
| Permalink to this page | |