A hybrid generative-discriminative approach to speaker diarization

Authors
Publication date 2008
Host editors
  • A. Popescu-Belis
  • R. Stiefelhagen
Book title Machine Learning for Multimodal Interaction
Book subtitle 5th International Workshop, MLMI 2008, Utrecht, The Netherlands, September 8-10, 2008 : proceedings
ISBN
  • 9783540858522
ISBN (electronic)
  • 9783540858539
Series Lecture Notes in Computer Science
Event 5th Joint Workshop on Machine Learning and Multimodal Interaction (MLMI 2008), Utrecht, the Netherlands
Pages (from-to) 98-109
Publisher Berlin: Springer
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract In this paper we present a sound probabilistic approach to speaker diarization. We use a hybrid framework where a distribution over the number of speakers at each point of a multimodal stream is estimated with a discriminative model. The output of this process is used as input in a generative model that can adapt to a novel test set and perform high accuracy speaker diarization. We manage to deal efficiently with the less common, and therefore harder, segments like silence and multiple speaker parts in a principled probabilistic manner.
Document type Conference contribution
Language English
Published at https://doi.org/10.1007/978-3-540-85853-9_9
Permalink to this page
Back