Generating Pseudo-ground Truth for Predicting New Concepts in Social Streams

Open Access
Authors
Publication date 2014
Host editors
  • M. de Rijke
  • T. Kenter
  • A.P. de Vries
  • C.X. Zhai
  • F. de Jong
  • K. Radinsky
  • K. Hofmann
Book title Advances in Information Retrieval
Book subtitle 36th European Conference on IR Research, ECIR 2014, Amsterdam, The Netherlands, April 13-16, 2014: proceedings
ISBN
  • 9783319060279
ISBN (electronic)
  • 9783319060286
Series Lecture Notes in Computer Science
Event 36th European Conference on Information Retrieval (ECIR '14)
Pages (from-to) 286-298
Publisher Cham: Springer
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
The manual curation of knowledge bases is a bottleneck in fast paced domains where new concepts constantly emerge. Identification of nascent concepts is important for improving early entity linking, content interpretation, and recommendation of new content in real-time applications. We present an unsupervised method for generating pseudo-ground truth for training a named entity recognizer to specifically identify entities that will become concepts in a knowledge base in the setting of social streams. We show that our method is able to deal with missing labels, justifying the use of pseudo-ground truth generation in this task. Finally, we show how our method significantly outperforms a lexical-matching baseline, by leveraging strategies for sampling pseudo-ground truth based on entity confidence scores and textual quality of input documents.
Document type Conference contribution
Language English
Published at https://doi.org/10.1007/978-3-319-06028-6_24
Downloads
ecir2014-newconcepts (Submitted manuscript)
Permalink to this page
Back