Multileaved Comparisons for Fast Online Evaluation

Authors
Publication date 2014
Host editors
  • J. Li
  • X.S. Wang
Book title CIKM'14: proceedings of the 2014 ACM International Conference on Information and Knowledge Management: November 3-7, 2014, Shanghai, China
ISBN
  • 9781450325981
Event 23rd Conference on Information and Knowledge Management (CIKM 2014)
Pages (from-to) 71-80
Publisher New York: ACM
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
Evaluation methods for information retrieval systems come in three types: offline evaluation, using static data sets annotated for relevance by human judges; user studies, usually conducted in a lab-based setting; and online evaluation, using implicit signals such as clicks from actual users. For the latter, preferences between rankers are typically inferred from implicit signals via interleaved comparison methods, which combine a pair of rankings and display the result to the user. We propose a new approach to online evaluation called multileaved comparisons that is useful in the prevalent case where designers are interested in the relative performance of more than two rankers. Rather than combining only a pair of rankings, multileaved comparisons combine an arbitrary number of rankings. The resulting user clicks then give feedback about how all these rankings compare to each other. We propose two specific multileaved comparison methods. The first, called team draft multileave, is an extension of team draft interleave. The second, called optimized multileave, is an extension of optimized interleave and is designed to handle cases where a large number of rankers must be multileaved. We present experimental results that demonstrate that both team draft multileave and optimized multileave can accurately determine all pairwise preferences among a set of rankers using far less data than the interleaving methods that they extend.
Document type Conference contribution
Language English
Published at https://doi.org/10.1145/2661829.2661952
Permalink to this page
Back