<i>k</i>NN For Whisper And Its Effect On Bias And Speaker Adaptation

Maya K. Nachesa; Vlad Niculae

doi:https://doi.org/10.18653/v1/2025.findings-naacl.369

kNN For Whisper And Its Effect On Bias And Speaker Adaptation

Authors	Maya K. Nachesa Vlad Niculae
Publication date	2025
Host editors	Luis Chiruzzo Alan Ritter Lu Wang
Book title	Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics : Proceedings of the Conference : Findings
Book subtitle	NAACL 2025 : April 29-May 4, 2025
ISBN (electronic)	9798891761957
Event	2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics, NAACL 2025
Pages (from-to)	6636-6642
Number of pages	7
Publisher	Kerrville, TX: Association for Computational Linguistics
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	Speech recognition performance varies by language, domain, and speaker characteristics such as accent, but fine-tuning a model on any of these categories may lead to catastrophic forgetting. Token-level k nearest neighbor search (kNN), first proposed for neural sequence decoders for natural language generation (NLG) and machine translation (MT), is a non-parametric method that instead adapts using inference-time search in an external datastore, without training the underlying model. We show that Whisper, a transformer end-to-end speech recognition model, benefits from kNN. We investigate the differences between the speech and text setups. We discuss implications for speaker adaptation, and analyze improvements by gender, accent, and age.
Document type	Conference contribution
Language	English
Published at	https://doi.org/10.18653/v1/2025.findings-naacl.369
Other links	https://www.scopus.com/pages/publications/105028791022
Downloads	2025.findings-naacl.369v1 (Proof) 2025.findings-naacl.369v2 (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

kNN For Whisper And Its Effect On Bias And Speaker Adaptation