Where to stop reading a ranked list? Threshold optimization using truncated score distributions

Authors
Publication date 2009
Host editors
  • M. Sanderson
  • C. Zhai
  • J. Zobel
  • J. Allan
  • J.A. Aslam
Book title Proceedings: 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval: SIGIR 2009, Boston, Massachusetts, July 19-23, 2009
ISBN
  • 9781605584836
Event 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2009), Boston, MA
Pages (from-to) 524-531
Publisher New York: ACM Press
Organisations
  • Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract
Ranked retrieval has a particular disadvantage in comparison with traditional Boolean retrieval: there is no clear cut-off point where to stop consulting results. This is a serious problem in some setups. We investigate and further develop methods to select the rank cut-off value which optimizes a given effectiveness measure. Assuming no other input than a system's output for a query--document scores and their distribution--the task is essentially a score-distributional threshold optimization problem. The recent trend in modeling score distributions is to use a normal-exponential mixture: normal for relevant, and exponential for non-relevant document scores. We discuss the two main theoretical problems with the current model, support incompatibility and non-convexity, and develop new models that address them. The main contributions of the paper are two truncated normal-exponential models, varying in the way the out-truncated score ranges are handled. We conduct a range of experiments using the TREC 2007 and 2008 Legal Track data, and show that the truncated models lead to significantly better results.
Document type Conference contribution
Published at http://doi.acm.org/10.1145/1571941.1572031
Permalink to this page
Back