How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model

M. Hanna; Ollie Liu; Alexandre Variengien

How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model

Authors	M. Hanna Ollie Liu Alexandre Variengien
Publication date	2023
Host editors	A. Oh T. Naumann A. Globerson K. Saenko M. Hardt S. Levine
Book title	37th Conference on Neural Information Processing Systems (NeurIPS 2023)
Book subtitle	10-16 December 2023, New Orleans, Louisana, USA
ISBN (electronic)	9781713899921
Series	Advances in Neural Information Processing Systems
Event	37th Conference on Neural Information Processing Systems (NeurIPS 2023)
Number of pages	28
Publisher	Neural Information Processing Systems Foundation
Organisations	Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract	Pre-trained language models can be surprisingly adept at tasks they were not explicitly trained on, but how they implement these capabilities is poorly understood. In this paper, we investigate the basic mathematical abilities often acquired by pre-trained language models. Concretely, we use mechanistic interpretability techniques to explain the (limited) mathematical abilities of GPT-2 small. As a case study, we examine its ability to take in sentences such as "The war lasted from the year 1732 to the year 17", and predict valid two-digit end years (years > 32). We first identify a circuit, a small subset of GPT-2 small's computational graph that computes this task's output. Then, we explain the role of each circuit component, showing that GPT-2 small's final multi-layer perceptrons boost the probability of end years greater than the start year. Finally, we find related tasks that activate our circuit. Our results suggest that GPT-2 small computes greater-than using a complex but general mechanism that activates across diverse contexts.
Document type	Conference contribution
Note	With supplementary ZIP-file
Language	English
Published at	https://papers.nips.cc/paper_files/paper/2023/hash/efbba7719cc5172d175240f24be11280-Abstract-Conference.html
Other links	https://doi.org/10.52202/075280
Downloads	NeurIPS-2023-how-does-gpt-2-compute-greater-than-interpreting-mathematical-abilities-in-a-pre-trained-language-model-Paper-Conference (Accepted author manuscript)
Supplementary materials	NeurIPS-2023-how-does-gpt-2-compute-greater-than-interpreting-mathematical-abilities-in-a-pre-trained-language-model-Supplemental-Conference
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model