- Untrained Forced Alignment of Transcriptions and Audio for Language Documentation Corpora Using WebMAUS
- LREC 2014, Ninth International Conference on Language Resources and Evaluation
- Book/source title
- Proceedings of the Ninth International Conference on Language Resources and Evaluation
- Pages (from-to)
- European Language Resources Association (ELRA)
- Document type
- Conference contribution
- Faculty of Humanities (FGw)
- Amsterdam Center for Language and Communication (ACLC)
Language documentation projects supported by recent funding intiatives have created a large number of multimedia corpora of typologically diverse languages. Most of these corpora provide a manual alignment of transcription and audio data at the level of larger units, such as sentences or intonation units. Their usefulness both for corpus-linguistic and psycholinguistic research and for the
development of tools and teaching materials could, however, be increased by achieving a more fine-grained alignment of transcription and audio at the word or even phoneme level. Since most language documentation corpora contain data on small languages, there usually do not exist any speech recognizers or acoustic models specifically trained on these languages. We therefore investigate the
feasibility of untrained forced alignment for such corpora. We report on an evaluation of the tool (Web)MAUS (Kisler et al., 2012) on several language documentation corpora and discuss practical issues in the application of forced alignment. Our evaluation shows that (Web)MAUS with its existing acoustic models combined with simple grapheme-to-phoneme conversion can be successfully used for word-level forced alignment of a diverse set of languages without additional training, especially if a manual prealignment of larger annotation units is already avaible.
If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library, or send a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible.