Transformer-specific Interpretability
| Authors |
|
|---|---|
| Publication date | 2024 |
| Host editors |
|
| Book title | The 18th Conference of the European Chapter of the Association for Computational Linguistics : Proceedings of Tutorial Abstracts |
| Book subtitle | EACL : March 21, 2024 |
| ISBN (electronic) |
|
| Event | 18th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2024 |
| Pages (from-to) | 21-26 |
| Number of pages | 6 |
| Publisher | Kerrville, TX: Association for Computational Linguistics |
| Organisations |
|
| Abstract |
Transformers have emerged as dominant players in various scientific fields, especially NLP. However, their inner workings, like many other neural networks, remain opaque. In spite of the widespread use of model-agnostic interpretability techniques, including gradient-based and occlusion-based, their shortcomings are becoming increasingly apparent for Transformer interpretation, making the field of interpretability more demanding today. In this tutorial, we will present Transformer-specific interpretability methods, a new trending approach, that make use of specific features of the Transformer architecture and are deemed more promising for understanding Transformer-based models. We start by discussing the potential pitfalls and misleading results model-agnostic approaches may produce when interpreting Transformers. Next, we discuss Transformer-specific methods, including those designed to quantify context-mixing interactions among all input pairs (as the fundamental property of the Transformer architecture) and those that combine causal methods with low-level Transformer analysis to identify particular subnetworks within a model that are responsible for specific tasks. By the end of the tutorial, we hope participants will understand the advantages (as well as current limitations) of Transformer-specific interpretability methods, along with how these can be applied to their own research. |
| Document type | Conference contribution |
| Language | English |
| Published at | https://doi.org/10.18653/v1/2024.eacl-tutorials.4 |
| Other links | https://www.scopus.com/pages/publications/85188837107 |
| Downloads |
Transformer-specific Interpretability_tutorial
(Final published version)
|
| Permalink to this page | |