A hybrid generative-discriminative approach to speaker diarization

Authors	A.K. Noulas T. van Kasteren B.J.A. Kröse
Publication date	2008
Host editors	A. Popescu-Belis R. Stiefelhagen
Book title	Machine Learning for Multimodal Interaction
Book subtitle	5th International Workshop, MLMI 2008, Utrecht, The Netherlands, September 8-10, 2008 : proceedings
ISBN	9783540858522
ISBN (electronic)	9783540858539
Series	Lecture Notes in Computer Science
Event	5th Joint Workshop on Machine Learning and Multimodal Interaction (MLMI 2008), Utrecht, the Netherlands
Pages (from-to)	98-109
Publisher	Berlin: Springer
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	In this paper we present a sound probabilistic approach to speaker diarization. We use a hybrid framework where a distribution over the number of speakers at each point of a multimodal stream is estimated with a discriminative model. The output of this process is used as input in a generative model that can adapt to a novel test set and perform high accuracy speaker diarization. We manage to deal efficiently with the less common, and therefore harder, segments like silence and multiple speaker parts in a principled probabilistic manner.
Document type	Conference contribution
Language	English
Published at	https://doi.org/10.1007/978-3-540-85853-9_9 (Final published version)
Permalink to this page

Back

UvA-DARE