A Dynamically Expandable, Weakly Supervised, Audio-Visual Database of Stuttered Speech

M. Altinkaya; A.W.M. Smeulders

doi:https://doi.org/10.1145/3423325.3423733

A Dynamically Expandable, Weakly Supervised, Audio-Visual Database of Stuttered Speech

Authors	M. Altinkaya A.W.M. Smeulders
Publication date	2020
Book title	MuCAI '20
Book subtitle	proceedings of the 1st International Workshop on Multimodal Conversational AI : October 16, 2020, Virtual Event, USA
ISBN (electronic)	9781450381567
Event	1st International Workshop on Multimodal Conversational AI
Pages (from-to)	9-13
Publisher	New York, NY: Association for Computing Machinery
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	Stuttering affects at least 1% of the world population. It is caused by irregular disruptions in speech production. These interruptions occur in various forms and frequencies. Repetition of words or parts of words, prolongations, or blocks in getting the words out are the most common ones. Accurate detection and classification of stuttering would be important in the assessment of severity for speech therapy. Furthermore, real time detection might create many new possibilities to facilitate reconstruction into fluent speech. Such an interface could help people to utilize voice-based interfaces like Apple Siri and Google Assistant, or to make (video) phone calls more fluent by delayed delivery. In this paper we present the first expandable audio-visual database of stuttered speech. We explore an end-to-end, real-time, multi-modal model for detection and classification of stuttered blocks in unbound speech. We also make use of video signals since acoustic signals cannot be produced immediately. We use multiple modalities as acoustic signals together with secondary characteristics exhibited in visual signals will permit an increased accuracy of detection.
Document type	Conference contribution
Note	Title in ACM library: A Dynamic, Self Supervised, Large Scale AudioVisual Dataset for Stuttered Speech.
Language	English
Published at	https://doi.org/10.1145/3423325.3423733
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

A Dynamically Expandable, Weakly Supervised, Audio-Visual Database of Stuttered Speech