Encoding of lexical tone in self-supervised models of spoken language

Open Access
Authors
  • G. Chrupała
Publication date 2024
Host editors
  • K. Duh
  • H. Gomez
  • S. Bethard
Book title The 2024 Conference of the North American Chapter of the Association for Computational Linguistics : proceedings of the conference
Book subtitle NAACL 2024 : June 16-21, 2024
ISBN (electronic)
  • 9798891761148
Event 2024 Conference of the North American Chapter of the Association for Computational Linguistics
Volume | Issue number 1
Pages (from-to) 4250-4261
Publisher Kerrville, TX: Association for Computational Linguistics
Organisations
  • Faculty of Humanities (FGw) - Amsterdam Institute for Humanities Research (AIHR) - Amsterdam Center for Language and Communication (ACLC)
Abstract
Interpretability research has shown that self-supervised Spoken LanguageModels (SLMs) encode a wide variety of features in human speech from theacoustic, phonetic, phonological, syntactic and semantic levels, to speakercharacteristics. The bulk of prior research on representations of phonologyhas focused on segmental features such as phonemes; the encoding ofsuprasegmental phonology (such as tone and stress patterns) in SLMs is not yetwell understood. Tone is a suprasegmental feature that is present in more thanhalf of the world’s languages. This paper aims to analyze the tone encodingcapabilities of SLMs, using Mandarin and Vietnamese as case studies. We showthat SLMs encode lexical tone to a significant degree even when they aretrained on data from non-tonal languages. We further find that SLMs behavesimilarly to native and non-native human participants in tone and consonantperception studies, but they do not follow the same developmental trajectory.
Document type Conference contribution
Note With supplementary video
Language English
Published at https://doi.org/10.18653/v1/2024.naacl-long.239
Downloads
2024.naacl-long.239 (Final published version)
Supplementary materials
Permalink to this page
Back