Beyond Shared Vocabulary: Increasing Representational Word Similarities across Languages for Multilingual Machine Translation

doi:https://doi.org/10.18653/v1/2023.emnlp-main.605

Beyond Shared Vocabulary: Increasing Representational Word Similarities across Languages for Multilingual Machine Translation

Authors	D. Wu C. Monz
Publication date	2023
Host editors	H. Bouamor J. Pino K. Bali
Book title	The 2023 Conference on Empirical Methods in Natural Language Processing
Book subtitle	EMNLP 2023 : Proceedings of the Conference : December 6-10, 2023
ISBN (electronic)	9798891760608
Event	2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023
Pages (from-to)	9749–9764
Publisher	Stroudsburg, PA: Association for Computational Linguistics
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	Using a shared vocabulary is common practice in Multilingual Neural Machine Translation (MNMT). In addition to its simple design, shared tokens play an important role in positive knowledge transfer, which manifests naturally when the shared tokens refer to similar meanings across languages. However, when words overlap is small, e.g., using different writing systems, transfer is inhibited. In this paper, we propose a re-parameterized method for building embeddings to alleviate this problem. More specifically, we define word-level information transfer pathways via word equivalence classes and rely on graph networks to fuse word embeddings across languages. Our experiments demonstrate the advantages of our approach: 1) the semantics of embeddings are better aligned across languages, 2) our method achieves evident BLEU improvements on high- and low-resource MNMT, and 3) only less than 1.0% additional trainable parameters are required with a limited increase in computational costs, while the inference time is identical to baselines.
Document type	Conference contribution
Note	With supplementary video
Language	English
Published at	https://doi.org/10.18653/v1/2023.emnlp-main.605
Downloads	2023.emnlp-main.605 (Final published version)
Supplementary materials	2023.emnlp-main.605
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Beyond Shared Vocabulary: Increasing Representational Word Similarities across Languages for Multilingual Machine Translation