A theoretical and empirical analysis of Expected Sarsa

Authors	H. van Seijen H. van Hasselt S. Whiteson M. Wiering
Publication date	2009
Book title	Proceedings of the IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning
ISBN	9781424427611
Event	2009 IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (IEEE ADPRL 2009), Nashville, TN, USA
Pages (from-to)	177-184
Publisher	IEEE
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	This paper presents a theoretical and empirical analysis of Expected Sarsa, a variation on Sarsa, the classic on-policy temporal-difference method for model-free reinforcement learning. Expected Sarsa exploits knowledge about stochasticity in the behavior policy to perform updates with lower variance. Doing so allows for higher learning rates and thus faster learning. In deterministic environments, Expected Sarsas updates have zero variance, enabling a learning rate of 1. We prove that Expected Sarsa converges under the same conditions as Sarsa and formulate specific hypotheses about when Expected Sarsa will outperform Sarsa and Q-learning. Experiments in multiple domains confirm these hypotheses and demonstrate that Expected Sarsa has significant advantages over these more commonly used methods.
Document type	Conference contribution
Published at	https://doi.org/10.1109/ADPRL.2009.4927542
Permalink to this page

Back

UvA-DARE