Where to stop reading a ranked list? Threshold optimization using truncated score distributions

Where to stop reading a ranked list? Threshold optimization using truncated score distributions

Authors	A. Arampatzis J. Kamps S. Robertson
Publication date	2009
Host editors	M. Sanderson C. Zhai J. Zobel J. Allan J.A. Aslam
Book title	Proceedings: 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval: SIGIR 2009, Boston, Massachusetts, July 19-23, 2009
ISBN	9781605584836
Event	32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2009), Boston, MA
Pages (from-to)	524-531
Publisher	New York: ACM Press
Organisations	Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract	Ranked retrieval has a particular disadvantage in comparison with traditional Boolean retrieval: there is no clear cut-off point where to stop consulting results. This is a serious problem in some setups. We investigate and further develop methods to select the rank cut-off value which optimizes a given effectiveness measure. Assuming no other input than a system's output for a query--document scores and their distribution--the task is essentially a score-distributional threshold optimization problem. The recent trend in modeling score distributions is to use a normal-exponential mixture: normal for relevant, and exponential for non-relevant document scores. We discuss the two main theoretical problems with the current model, support incompatibility and non-convexity, and develop new models that address them. The main contributions of the paper are two truncated normal-exponential models, varying in the way the out-truncated score ranges are handled. We conduct a range of experiments using the TREC 2007 and 2008 Legal Track data, and show that the truncated models lead to significantly better results.
Document type	Conference contribution
Published at	http://doi.acm.org/10.1145/1571941.1572031
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Where to stop reading a ranked list? Threshold optimization using truncated score distributions