AuteursD.S. Salomons, L.C.W. Pols
TitelAlternatives in training acoustic models for the automatic recognition of spoken city names
Boek/bron titelIFA Proceedings 24
FaculteitFaculteit der Geesteswetenschappen
Instituut/afd.FGw: Amsterdam Center for Language and Communication (ACLC)
TrefwoordenSpeech; Automatic recognition
SamenvattingTraining the acoustic models for automatic speech recognition (ASR) as well as the similarity between the training corpus and the recognition task have a major influence on the performance of a speech recogniser. The more similar the two sets are, the better the performance of the speech recogniser will be. When the recognition task consists of city names, this implies that the training corpus must consist of city names only. This causes problems, especially in the Netherlands, because there are not enough speech data available of the smaller cities or villages to satisfy the need for rare phonemes. The usual alternative in such a case is training acoustic models with a speech corpus consisting of phonetically rich sentences. The disadvantage is that these sentences are spoken in a 'read aloud' speech style, while the intended recognition task consists of spontaneous speech. This causes a great decrease in performance. Adding (sur)names, street names and application words only adds a small number of rare phonemes to the training data. Fewer speech data are generally a performance-decreasing factor in speech recognition. On the other hand the greater similarity with the recognition task is a performance-increasing factor. In this research the subject of investigation was whether this latter factor would outweigh the former one in this specific task. This appeared to be the case. This research was done during an intern at KPN Research in Leidschendam, the Netherlands. This intern was part of the master's graduation project of the first author of this paper. The report about this project was also his master thesis (Salomons, 2000).
Soort documentHoofdstuk
