Keeping Dataset Biases out of the Simulation A Debiased Simulator for Reinforcement Learning based Recommender Systems

Open Access
Authors
Publication date 2020
Book title RECSYS 2020
Book subtitle 14th ACM Conference on Recommender Systems : Virtual Event, Brazil, September 22-26, 2020
ISBN (electronic)
  • 9781450375832
Event 14th ACM Conference on Recommender Systems
Pages (from-to) 190–199
Publisher New York, NY: The Association for Computing Machinery
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
Reinforcement learning for recommendation (RL4Rec) methods are increasingly receiving attention as an effective way to improve long-term user engagement. However, applying RL4Rec online comes with risks: exploration may lead to periods of detrimental user experience. Moreover, few researchers have access to real-world recommender systems. Simulations have been put forward as a solution where user feedback is simulated based on logged historical user data, thus enabling optimization and evaluation without being run online. While simulators do not risk the user experience and are widely accessible, we identify an important limitation of existing simulation methods. They ignore the interaction biases present in logged user data, and consequently, these biases affect the resulting simulation. As a solution to this issue, we introduce a debiasing step in the simulation pipeline, which corrects for the biases present in the logged data before it is used to simulate user behavior. To evaluate the effects of bias on RL4Rec simulations, we propose a novel evaluation approach for simulators that considers the performance of policies optimized with the simulator. Our results reveal that the biases from logged data negatively impact the resulting policies, unless corrected for with our debiasing method. While our debiasing methods can be applied to any simulator, we make our complete pipeline publicly available as the Simulator for OFfline leArning and evaluation (SOFA): the first simulator that accounts for interaction biases prior to optimization and evaluation.
Document type Conference contribution
Language English
Published at https://doi.org/10.1145/3383313.3412252
Downloads
3383313.3412252 (Final published version)
huang-2020-keeping (Final published version)
Permalink to this page
Back