Transformer-specific Interpretability

Open Access
Authors
Publication date 2024
Host editors
  • M. Mesgar
  • S. LoĆ”iciga
Book title The 18th Conference of the European Chapter of the Association for Computational Linguistics : Proceedings of Tutorial Abstracts
Book subtitle EACL : March 21, 2024
ISBN (electronic)
  • 9798891760929
Event 18th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2024
Pages (from-to) 21-26
Number of pages 6
Publisher Kerrville, TX: Association for Computational Linguistics
Organisations
  • Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract

Transformers have emerged as dominant players in various scientific fields, especially NLP. However, their inner workings, like many other neural networks, remain opaque. In spite of the widespread use of model-agnostic interpretability techniques, including gradient-based and occlusion-based, their shortcomings are becoming increasingly apparent for Transformer interpretation, making the field of interpretability more demanding today. In this tutorial, we will present Transformer-specific interpretability methods, a new trending approach, that make use of specific features of the Transformer architecture and are deemed more promising for understanding Transformer-based models. We start by discussing the potential pitfalls and misleading results model-agnostic approaches may produce when interpreting Transformers. Next, we discuss Transformer-specific methods, including those designed to quantify context-mixing interactions among all input pairs (as the fundamental property of the Transformer architecture) and those that combine causal methods with low-level Transformer analysis to identify particular subnetworks within a model that are responsible for specific tasks. By the end of the tutorial, we hope participants will understand the advantages (as well as current limitations) of Transformer-specific interpretability methods, along with how these can be applied to their own research.

Document type Conference contribution
Language English
Published at https://doi.org/10.18653/v1/2024.eacl-tutorials.4
Other links https://www.scopus.com/pages/publications/85188837107
Downloads
Permalink to this page
Back