Encoding of lexical tone in self-supervised models of spoken language

G. Shen; M. Watkins; A. Alishahi; A. Bisazza; G. Chrupała

doi:https://doi.org/10.18653/v1/2024.naacl-long.239

Encoding of lexical tone in self-supervised models of spoken language

Authors	G. Shen M. Watkins A. Alishahi A. Bisazza G. Chrupała
Publication date	2024
Host editors	K. Duh H. Gomez S. Bethard
Book title	The 2024 Conference of the North American Chapter of the Association for Computational Linguistics : proceedings of the conference
Book subtitle	NAACL 2024 : June 16-21, 2024
ISBN (electronic)	9798891761148
Event	2024 Conference of the North American Chapter of the Association for Computational Linguistics
Volume \| Issue number	1
Pages (from-to)	4250-4261
Publisher	Kerrville, TX: Association for Computational Linguistics
Organisations	Faculty of Humanities (FGw) - Amsterdam Institute for Humanities Research (AIHR) - Amsterdam Center for Language and Communication (ACLC)
Abstract	Interpretability research has shown that self-supervised Spoken LanguageModels (SLMs) encode a wide variety of features in human speech from theacoustic, phonetic, phonological, syntactic and semantic levels, to speakercharacteristics. The bulk of prior research on representations of phonologyhas focused on segmental features such as phonemes; the encoding ofsuprasegmental phonology (such as tone and stress patterns) in SLMs is not yetwell understood. Tone is a suprasegmental feature that is present in more thanhalf of the world’s languages. This paper aims to analyze the tone encodingcapabilities of SLMs, using Mandarin and Vietnamese as case studies. We showthat SLMs encode lexical tone to a significant degree even when they aretrained on data from non-tonal languages. We further find that SLMs behavesimilarly to native and non-native human participants in tone and consonantperception studies, but they do not follow the same developmental trajectory.
Document type	Conference contribution
Note	With supplementary video
Language	English
Published at	https://doi.org/10.18653/v1/2024.naacl-long.239
Downloads	2024.naacl-long.239 (Final published version)
Supplementary materials	2024.naacl-long.239
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Encoding of lexical tone in self-supervised models of spoken language