- Significant Words Representations of Entities
- SIGIR 2016: 39th international ACM SIGIR conference on Research and development in information retrieval
- Book/source title
- Book/source subtitle
- the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval: Pisa, Italy , July 17-21, 2016
- Number of pages
- New York, NY: Association for Computing Machinery
- ISBN (electronic)
- Document type
- Conference contribution
- Interfacultary Research Institutes
Faculty of Science (FNWI)
- Institute for Logic, Language and Computation (ILLC)
Transforming the data into a suitable representation is the first key step of data analysis, and the performance of any data-oriented method is heavily depending on it. We study questions on how we can best learn representations for textual entities that are: 1) precise, 2) robust against noisy terms, 3) transferable over time, and 4) interpretable by human inspection. Inspired by the early work of Luhn, we propose significant words language models of a set of documents that capture all, and only, the significant shared terms from them. We adjust the weights of common terms that are already well explained by the document collection as well as the weight of incidental rare terms that are only explained by specific documents, which eventually results in having only the significant terms left in the model.
- go to publisher's site
If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library, or send a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible.