Interpretability of Language Models via Task Spaces

L. Weber; J. Jumelet; E. Bruni; D. Hupkes

doi:https://doi.org/10.18653/v1/2024.acl-long.248

Interpretability of Language Models via Task Spaces

Authors	L. Weber J. Jumelet E. Bruni D. Hupkes
Publication date	2024
Host editors	L.-W. Ku A. Martins V. Srikumar
Book title	The 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024) : proceedings of the conference
Book subtitle	ACL 2024 : August 11-16, 2024
ISBN (electronic)	9798891760943
Event	62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024
Volume \| Issue number	1
Pages (from-to)	4522-4538
Publisher	Kerrville, TX: Association for Computational Linguistics
Organisations	Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract	The usual way to interpret language models (LMs) is to test their performance on different benchmarks and subsequently infer their internal processes.In this paper, we present an alternative approach, concentrating on the _quality_ of LM processing, with a focus on their language abilities.To this end, we construct ‘linguistic task spaces’ – representations of an LM’s language conceptualisation – that shed light on the connections LMs draw between language phenomena.Task spaces are based on the interactions of the learning signals from different linguistic phenomena, which we assess via a method we call ‘similarity probing’.To disentangle the learning signals of linguistic phenomena, we further introduce a method called ‘fine-tuning via gradient differentials’ (FTGD).We apply our methods to language models of three different scales and find that larger models generalise better to overarching general concepts for linguistic tasks, making better use of their shared structure. Further, the distributedness of linguistic processing increases with pre-training through increased parameter sharing between related linguistic tasks. The overall generalisation patterns are mostly stable throughout training and not marked by incisive stages, potentially explaining the lack of successful curriculum strategies for LMs.
Document type	Conference contribution
Language	English
Published at	https://doi.org/10.18653/v1/2024.acl-long.248 (Final published version)
Downloads	2024.acl-long.248 (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Interpretability of Language Models via Task Spaces