Where to stop reading a ranked list? Threshold optimization using truncated score distributions
| Authors |
|
|---|---|
| Publication date | 2009 |
| Host editors |
|
| Book title | Proceedings: 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval: SIGIR 2009, Boston, Massachusetts, July 19-23, 2009 |
| ISBN |
|
| Event | 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2009), Boston, MA |
| Pages (from-to) | 524-531 |
| Publisher | New York: ACM Press |
| Organisations |
|
| Abstract |
Ranked retrieval has a particular disadvantage in comparison with traditional Boolean retrieval: there is no clear cut-off point where to stop consulting results. This is a serious problem in some setups. We investigate and further develop methods to select the rank cut-off value which optimizes a given effectiveness measure. Assuming no other input than a system's output for a query--document scores and their distribution--the task is essentially a score-distributional threshold optimization problem. The recent trend in modeling score distributions is to use a normal-exponential mixture: normal for relevant, and exponential for non-relevant document scores. We discuss the two main theoretical problems with the current model, support incompatibility and non-convexity, and develop new models that address them. The main contributions of the paper are two truncated normal-exponential models, varying in the way the out-truncated score ranges are handled. We conduct a range of experiments using the TREC 2007 and 2008 Legal Track data, and show that the truncated models lead to significantly better results.
|
| Document type | Conference contribution |
| Published at | http://doi.acm.org/10.1145/1571941.1572031 |
| Permalink to this page | |
