Transformer-specific Interpretability

H. Mohebbi; J. Jumelet; M. Hanna; A. Alishahi; W. Zuidema

doi:https://doi.org/10.18653/v1/2024.eacl-tutorials.4

Transformer-specific Interpretability

Authors	H. Mohebbi J. Jumelet M. Hanna A. Alishahi W. Zuidema
Publication date	2024
Host editors	M. Mesgar S. Loáiciga
Book title	The 18th Conference of the European Chapter of the Association for Computational Linguistics : Proceedings of Tutorial Abstracts
Book subtitle	EACL : March 21, 2024
ISBN (electronic)	9798891760929
Event	18th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2024
Pages (from-to)	21-26
Number of pages	6
Publisher	Kerrville, TX: Association for Computational Linguistics
Organisations	Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract	Transformers have emerged as dominant players in various scientific fields, especially NLP. However, their inner workings, like many other neural networks, remain opaque. In spite of the widespread use of model-agnostic interpretability techniques, including gradient-based and occlusion-based, their shortcomings are becoming increasingly apparent for Transformer interpretation, making the field of interpretability more demanding today. In this tutorial, we will present Transformer-specific interpretability methods, a new trending approach, that make use of specific features of the Transformer architecture and are deemed more promising for understanding Transformer-based models. We start by discussing the potential pitfalls and misleading results model-agnostic approaches may produce when interpreting Transformers. Next, we discuss Transformer-specific methods, including those designed to quantify context-mixing interactions among all input pairs (as the fundamental property of the Transformer architecture) and those that combine causal methods with low-level Transformer analysis to identify particular subnetworks within a model that are responsible for specific tasks. By the end of the tutorial, we hope participants will understand the advantages (as well as current limitations) of Transformer-specific interpretability methods, along with how these can be applied to their own research.
Document type	Conference contribution
Language	English
Published at	https://doi.org/10.18653/v1/2024.eacl-tutorials.4
Other links	https://www.scopus.com/pages/publications/85188837107
Downloads	Transformer-specific Interpretability_tutorial (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Transformer-specific Interpretability