Power Analysis for Interleaving Experiments by Means of Offline Evaluation

Open Access
Authors
Publication date 2016
Book title ICTIR'16
Book subtitle Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval : September 12-16, 2016, Newark, Delaware, USA
ISBN (electronic)
  • 9781450344975
Event ICTIR '16 ACM SIGIR International Conference on the Theory of Information Retrieval
Pages (from-to) 87-90
Publisher New York, NY: Association for Computing Machinery
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
Evaluation in information retrieval takes one of two forms: collection-based offline evaluation, and in-situ online evaluation. Collections constructed by the former methodology are reusable, and hence able to test the effectiveness of any experimental algorithm, while the latter requires a different experiment for every new algorithm. Due to this a funnel approach is often being used, with experimental algorithms being compared to the baseline in an online experiment only if they outperform the baseline in an offline experiment. One of the key questions in the design of online and offline experiments concerns the number of measurements required to detect a statistically significant difference between two algorithms. Power analysis can provide an answer to this question, however, it requires an a-priori knowledge of the difference in effectiveness to be detected, and the variance in the measurements. The variance is typically estimated using historical data, but setting a detectable difference prior to the experiment can lead to suboptimal, upper-bound results. In this work we make use of the funnel approach in evaluation and test whether the difference in the effectiveness of two algorithms measured by the offline experiment can inform the required number of impression of an online interleaving experiment. Our analysis on simulated data shows that the number of impressions required are correlated with the difference in the offline experiment, but at the same time widely vary for any given difference.
Document type Conference contribution
Language English
Published at https://doi.org/10.1145/2970398.2970432
Downloads
p87-azarbonyad (Final published version)
Permalink to this page
Back