Generating Pseudo-ground Truth for Predicting New Concepts in Social Streams
| Authors |
|
|---|---|
| Publication date | 2014 |
| Host editors |
|
| Book title | Advances in Information Retrieval |
| Book subtitle | 36th European Conference on IR Research, ECIR 2014, Amsterdam, The Netherlands, April 13-16, 2014: proceedings |
| ISBN |
|
| ISBN (electronic) |
|
| Series | Lecture Notes in Computer Science |
| Event | 36th European Conference on Information Retrieval (ECIR '14) |
| Pages (from-to) | 286-298 |
| Publisher | Cham: Springer |
| Organisations |
|
| Abstract |
The manual curation of knowledge bases is a bottleneck in fast paced domains where new concepts constantly emerge. Identification of nascent concepts is important for improving early entity linking, content interpretation, and recommendation of new content in real-time applications. We present an unsupervised method for generating pseudo-ground truth for training a named entity recognizer to specifically identify entities that will become concepts in a knowledge base in the setting of social streams. We show that our method is able to deal with missing labels, justifying the use of pseudo-ground truth generation in this task. Finally, we show how our method significantly outperforms a lexical-matching baseline, by leveraging strategies for sampling pseudo-ground truth based on entity confidence scores and textual quality of input documents.
|
| Document type | Conference contribution |
| Language | English |
| Published at | https://doi.org/10.1007/978-3-319-06028-6_24 |
| Downloads |
ecir2014-newconcepts
(Submitted manuscript)
|
| Permalink to this page | |
