DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers

A. Langedijk; H. Mohebbi; G. Sarti; W. Zuidema; J. Jumelet

doi:https://doi.org/10.18653/v1/2024.findings-naacl.296

DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers

Authors	A. Langedijk H. Mohebbi G. Sarti W. Zuidema J. Jumelet
Publication date	2024
Host editors	K. Duh H. Gomez S. Bethard
Book title	Findings of the Association for Computational Linguistics: NAACL 2024: Findings
Book subtitle	Findings 2024 : June 16-21, 2024
ISBN (electronic)	9798891761193
Event	2024 Annual Conference of the North American Association for Computational Linguistics: Findings
Pages (from-to)	4764-4780
Number of pages	17
Publisher	Kerrville, TX: Association for Computational Linguistics
Organisations	Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract	In recent years, several interpretability methods have been proposed to interpret the inner workings of Transformer models at different levels of precision and complexity. In this work, we propose a simple but effective technique to analyze encoder-decoder Transformers. Our method, which we name DecoderLens, allows the decoder to cross-attend representations of intermediate encoder activations instead of using the default final encoder output. The method thus maps uninterpretable intermediate vector representations to human-interpretable sequences of words or symbols, shedding new light on the information flow in this popular but understudied class of models. We apply DecoderLens to question answering, logical reasoning, speech recognition and machine translation models, finding that simpler subtasks are solved with high precision by low and intermediate encoder layers.
Document type	Conference contribution
Language	English
Published at	https://doi.org/10.18653/v1/2024.findings-naacl.296
Other links	https://www.scopus.com/pages/publications/85197946623
Downloads	2024.findings-naacl.296 (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers