- Generating Pseudo-ground Truth for Predicting New Concepts in Social Streams
- Lecture Notes in Computer Science
- Pages (from-to)
- Document type
- Faculty of Science (FNWI)
- Informatics Institute (IVI)
The manual curation of knowledge bases is a bottleneck in fast paced domains where new concepts constantly emerge. Identification of nascent concepts is important for improving early entity linking, content interpretation, and recommendation of new content in real-time applications. We present an unsupervised method for generating pseudo-ground truth for training a named entity recognizer to specifically identify entities that will become concepts in a knowledge base in the setting of social streams. We show that our method is able to deal with missing labels, justifying the use of pseudo-ground truth generation in this task. Finally, we show how our method significantly outperforms a lexical-matching baseline, by leveraging strategies for sampling pseudo-ground truth based on entity confidence scores and textual quality of input documents.
- go to publisher's site
- Proceedings title: Advances in information retrieval: 36th European Conference on IR Research, ECIR 2014, Amsterdam, The Netherlands,
April 13-16, 2014: proceedings
Place of publication: Heidelberg
Editors: M. de Rijke, T. Kenter, A.P. de Vries, C.X. Zhai, F. de Jong, K. Radinsky, K. Hofmann
If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library, or send a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible.