CNN-based phoneme classifier from vocal tract MRI learns embedding consistent with articulatory topology

K.G. van Leeuwen; P. Bos; S. Trebeschi; M.J.A. van Alphen; L. Voskuilen; L.E. Smeele; F. van der Heijden; R.J.J.H. van Son

doi:https://doi.org/10.21437/Interspeech.2019-1173

CNN-based phoneme classifier from vocal tract MRI learns embedding consistent with articulatory topology

Authors	K.G. van Leeuwen P. Bos S. Trebeschi M.J.A. van Alphen L. Voskuilen L.E. Smeele F. van der Heijden R.J.J.H. van Son
Publication date	2019
Journal	Interspeech
Event	Interspeech 2019
Volume \| Issue number	20
Pages (from-to)	909-913
Number of pages	5
Organisations	Faculty of Dentistry (ACTA) Faculty of Humanities (FGw) - Amsterdam Institute for Humanities Research (AIHR) - Amsterdam Center for Language and Communication (ACLC)
Abstract	Recent advances in real-time magnetic resonance imaging (rtMRI) of the vocal tract provides opportunities for studying human speech. This modality together with acquired speech may enable the mapping of articulatory configurations to acoustic features. In this study, we take the first step by training a deep learning model to classify 27 different phonemes from midsagittal MR images of the vocal tract.An American English database was used to train a convolutional neural network for classifying vowels (13 classes), consonants (14 classes) and all phonemes (27 classes) of 17 subjects. Classification top-1 accuracy of the test set for all phonemes was 57%. Erroranalysis showedvoiced and unvoiced sounds often being confused. Moreover, we performed principal component analysis on the network’s embedding and observed topological similarities between thenetwork learned representation and the vowel diagram.Saliency maps gaveinsight intothe anatomical regions most important for classification and show congruence with knownregions of articulatory importance.We demonstrate the feasibility for deep learning to distinguish between phonemes from MRI. Network analysis can be used to improve understanding of normal articulation and speech and, in the future, impaired speech. This study brings us a step closer to the articulatory-to-acoustic mapping from rtMRI.
Document type	Article
Note	Crossroads of speech and language : 20th Annual Conference of the International Speech Communication Association : INTERSPEECH 2019 : Graz, Austria, 15-19 September 2019
Language	English
Published at	https://doi.org/10.21437/Interspeech.2019-1173
Downloads	CNN-based phoneme classifier from vocal tract (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

CNN-based phoneme classifier from vocal tract MRI learns embedding consistent with articulatory topology