Protecting against evaluation overfitting in empirical reinforcement learning

Authors	S. Whiteson B. Tanner M.E. Taylor P. Stone
Publication date	2011
Book title	Proceedings of the IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL 2011)
ISBN	9781424498871
Event	2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL 2011), Paris, France
Pages (from-to)	120-127
Publisher	IEEE
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	Empirical evaluations play an important role in machine learning. However, the usefulness of any evaluation depends on the empirical methodology employed. Designing good empirical methodologies is difficult in part because agents can overfit test evaluations and thereby obtain misleadingly high scores. We argue that reinforcement learning is particularly vulnerable to environment overfitting and propose as a remedy generalized methodologies, in which evaluations are based on multiple environments sampled from a distribution. In addition, we consider how to summarize performance when scores from different environments may not have commensurate values. Finally, we present proof-of-concept results demonstrating how these methodologies can validate an intuitively useful range-adaptive tile coding method.
Document type	Conference contribution
Language	English
Published at	https://doi.org/10.1109/ADPRL.2011.5967363 (Final published version)
Permalink to this page

Back

UvA-DARE