Significant Words Representations of Entities

Open Access
Authors
Publication date 2016
Book title SIGIR'16
Book subtitle the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval: Pisa, Italy , July 17-21, 2016
ISBN (electronic)
  • 9781450340694
Event SIGIR 2016: 39th international ACM SIGIR conference on Research and development in information retrieval
Pages (from-to) 1183
Number of pages 1
Publisher New York, NY: Association for Computing Machinery
Organisations
  • Faculty of Science (FNWI)
  • Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract
Transforming the data into a suitable representation is the first key step of data analysis, and the performance of any data-oriented method is heavily depending on it. We study questions on how we can best learn representations for textual entities that are: 1) precise, 2) robust against noisy terms, 3) transferable over time, and 4) interpretable by human inspection. Inspired by the early work of Luhn[1], we propose significant words language models of a set of documents that capture all, and only, the significant shared terms from them. We adjust the weights of common terms that are already well explained by the document collection as well as the weight of incidental rare terms that are only explained by specific documents, which eventually results in having only the significant terms left in the model.
Document type Conference contribution
Language English
Published at https://doi.org/10.1145/2911451.2911474
Downloads
p1183-dehghani (Final published version)
Permalink to this page
Back