Luhn revisited: Significant Words Language Models

M. Dehghani; H. Azarbonyad; J. Kamps; D. Hiemstra; M. Marx

doi:https://doi.org/10.1145/2983323.2983814

Luhn revisited: Significant Words Language Models

Authors	M. Dehghani H. Azarbonyad J. Kamps D. Hiemstra M. Marx
Publication date	2016
Book title	CIKM'16
Book subtitle	proceedings of the 2016 ACM Conference on Information and Knowledge Management : October 24-28, 2016, Indianapolis, IN, USA
ISBN (electronic)	9781450340731
Event	25th ACM International Conference on Information and Knowledge Management
Pages (from-to)	1301-1310
Number of pages	10
Publisher	New York, NY: Association for Computing Machinery
Organisations	Faculty of Humanities (FGw) Faculty of Science (FNWI) - Informatics Institute (IVI) Faculty of Science (FNWI) Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract	Users tend to articulate their complex information needs in only a few keywords, making underspecified statements of request the main bottleneck for retrieval effectiveness. Taking advantage of feedback information is one of the best ways to enrich the query representation, but can also lead to loss of query focus and harm performance in particular when the initial query retrieves only little relevant information when overfitting to accidental features of the particular observed feedback documents. Inspired by the early work of Luhn [23], we propose significant words language models of feedback documents that capture all, and only, the significant shared terms from feedback documents. We adjust the weights of common terms that are already well explained by the document collection as well as the weight of rare terms that are only explained by specific feedback documents, which eventually results in having only the significant terms left in the feedback model. Our main contributions are the following. First, we present significant words language models as the effective models capturing the essential terms and their probabilities. Second, we apply the resulting models to the relevance feedback task, and see a better performance over the state-of-the-art methods. Third, we see that the estimation method is remarkably robust making the models in- sensitive to noisy non-relevant terms in feedback documents. Our general observation is that the significant words language models more accurately capture relevance by excluding general terms and feedback document specific terms.
Document type	Conference contribution
Language	English
Related publication	Inoculating Relevance Feedback Against Poison Pills
Published at	https://doi.org/10.1145/2983323.2983814
Downloads	p1301-dehghani (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Luhn revisited: Significant Words Language Models