Analyzing the source and target contributions to predictions in neural machine translation

E. Voita; R. Sennrich; I. Titov

doi:https://doi.org/10.18653/v1/2021.acl-long.91

Analyzing the source and target contributions to predictions in neural machine translation

Authors	E. Voita R. Sennrich I. Titov
Publication date	2021
Host editors	C. Zong F. Xia W. Li R. Navigli
Book title	The 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing
Book subtitle	ACL-IJCNLP 2021 : proceedings of the conference : August 1-6, 2021
ISBN (electronic)	9781954085527
Event	The Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021)
Volume \| Issue number	1
Pages (from-to)	1126-1140
Number of pages	15
Publisher	Stroudsburg, PA: The Association for Computational Linguistics
Organisations	Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract	In Neural Machine Translation (and, more generally, conditional language modeling), the generation of a target token is influenced by two types of context: the source and the prefix of the target sequence. While many attempts to understand the internal workings of NMT models have been made, none of them explicitly evaluates relative source and target contributions to a generation decision. We argue that this relative contribution can be evaluated by adopting a variant of Layerwise Relevance Propagation (LRP). Its underlying 'conservation principle' makes relevance propagation unique: differently from other methods, it evaluates not an abstract quantity reflecting token importance, but the proportion of each token's influence. We extend LRP to the Transformer and conduct an analysis of NMT models which explicitly evaluates the source and target relative contributions to the generation process. We analyze changes in these contributions when conditioning on different types of prefixes, when varying the training objective or the amount of training data, and during the training process. We find that models trained with more data tend to rely on source information more and to have more sharp token contributions; the training process is non-monotonic with several stages of different nature.
Document type	Conference contribution
Note	With supplementary video.
Language	English
Published at	https://doi.org/10.18653/v1/2021.acl-long.91
Other links	https://github.com/lena-voita/the-story-of-heads https://www.scopus.com/pages/publications/85115874968
Downloads	2021.acl-long.91 (Final published version)
Supplementary materials	2021.acl-long.91
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Analyzing the source and target contributions to predictions in neural machine translation