Multilingual k-Nearest-Neighbor Machine Translation
| Authors | |
|---|---|
| Publication date | 2023 |
| Host editors |
|
| Book title | The 2023 Conference on Empirical Methods in Natural Language Processing |
| Book subtitle | EMNLP 2023 : Proceedings of the Conference : December 6-10, 2023 |
| ISBN (electronic) |
|
| Event | 2023 Conference on Empirical Methods in Natural Language Processing |
| Pages (from-to) | 9200–9208 |
| Publisher | Stroudsburg, PA: Association for Computational Linguistics |
| Organisations |
|
| Abstract |
k-nearest-neighbor machine translation has demonstrated remarkable improvements in machine translation quality by creating a datastore of cached examples. However, these improvements have been limited to high-resource language pairs, with large datastores, and remain a challenge for low-resource languages. In this paper, we address this issue by combining representations from multiple languages into a single datastore. Our results consistently demonstrate substantial improvements not only in low-resource translation quality (up to +3.6 BLEU), but also for high-resource translation quality (up to +0.5 BLEU). Our experiments show that it is possible to create multilingual datastores that are a quarter of the size, achieving a 5.3x speed improvement, by using linguistic similarities for datastore creation.
|
| Document type | Conference contribution |
| Language | English |
| Published at | https://doi.org/10.18653/v1/2023.emnlp-main.571 |
| Other links | https://github.com/davidstap/multilingual-kNN-mt |
| Downloads |
2023.emnlp-main.571
(Final published version)
|
| Permalink to this page | |