Subspace Chronicles: How Linguistic Information Emerges, Shifts and Interacts during Language Model Training

Open Access
Authors
  • M. Müller-Eberstein
  • R. van der Goot
  • B. Plank
  • I. Titov
Publication date 2023
Host editors
  • H. Bouamor
  • J. Pino
  • K. Bali
Book title The 2023 Conference on Empirical Methods in Natural Language Processing : Findings of the Association for Computational Linguistics: EMNLP 2023
Book subtitle December 6-10, 2023
ISBN (electronic)
  • 9798891760615
Event 2023 Conference on Empirical Methods in Natural Language Processing
Pages (from-to) 13190-13208
Number of pages 19
Publisher Stroudsburg, PA: Association for Computational Linguistics
Organisations
  • Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract

Representational spaces learned via language modeling are fundamental to Natural Language Processing (NLP), however there has been limited understanding regarding how and when during training various types of linguistic information emerge and interact. Leveraging a novel information theoretic probing suite, which enables direct comparisons of not just task performance, but their representational subspaces, we analyze nine tasks covering syntax, semantics and reasoning, across 2M pre-training steps and five seeds. We identify critical learning phases across tasks and time, during which subspaces emerge, share information, and later disentangle to specialize. Across these phases, syntactic knowledge is acquired rapidly after 0.5% of full training. Continued performance improvements primarily stem from the acquisition of open-domain knowledge, while semantics and reasoning tasks benefit from later boosts to long-range contextualization and higher specialization. Measuring cross-task similarity further reveals that linguistically related tasks share information throughout training, and do so more during the critical phase of learning than before or after. Our findings have implications for model interpretability, multi-task learning, and learning from limited data.

Document type Conference contribution
Language English
Published at https://doi.org/10.18653/v1/2023.findings-emnlp.879
Other links https://www.scopus.com/pages/publications/85183298663
Downloads
2023.findings-emnlp.879 (Final published version)
Permalink to this page
Back