Keeping Dataset Biases out of the Simulation

J. Huang; H. Oosterhuis; M. de Rijke; H. van Hoof

doi:https://doi.org/10.1145/3383313.3412252

Keeping Dataset Biases out of the Simulation A Debiased Simulator for Reinforcement Learning based Recommender Systems

Authors	J. Huang H. Oosterhuis M. de Rijke H. van Hoof
Publication date	2020
Book title	RECSYS 2020
Book subtitle	14th ACM Conference on Recommender Systems : Virtual Event, Brazil, September 22-26, 2020
ISBN (electronic)	9781450375832
Event	14th ACM Conference on Recommender Systems
Pages (from-to)	190–199
Publisher	New York, NY: The Association for Computing Machinery
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	Reinforcement learning for recommendation (RL4Rec) methods are increasingly receiving attention as an effective way to improve long-term user engagement. However, applying RL4Rec online comes with risks: exploration may lead to periods of detrimental user experience. Moreover, few researchers have access to real-world recommender systems. Simulations have been put forward as a solution where user feedback is simulated based on logged historical user data, thus enabling optimization and evaluation without being run online. While simulators do not risk the user experience and are widely accessible, we identify an important limitation of existing simulation methods. They ignore the interaction biases present in logged user data, and consequently, these biases affect the resulting simulation. As a solution to this issue, we introduce a debiasing step in the simulation pipeline, which corrects for the biases present in the logged data before it is used to simulate user behavior. To evaluate the effects of bias on RL4Rec simulations, we propose a novel evaluation approach for simulators that considers the performance of policies optimized with the simulator. Our results reveal that the biases from logged data negatively impact the resulting policies, unless corrected for with our debiasing method. While our debiasing methods can be applied to any simulator, we make our complete pipeline publicly available as the Simulator for OFfline leArning and evaluation (SOFA): the first simulator that accounts for interaction biases prior to optimization and evaluation.
Document type	Conference contribution
Language	English
Published at	https://doi.org/10.1145/3383313.3412252 (Final published version)
Downloads	3383313.3412252 (Final published version) huang-2020-keeping (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Keeping Dataset Biases out of the Simulation A Debiased Simulator for Reinforcement Learning based Recommender Systems