Critical factors in the empirical performance of temporal difference and evolutionary methods for reinforcement learning

S. Whiteson; M.E. Taylor; P. Stone

doi:https://doi.org/10.1007/s10458-009-9100-2

Critical factors in the empirical performance of temporal difference and evolutionary methods for reinforcement learning

Authors	S. Whiteson M.E. Taylor P. Stone
Publication date	2010
Journal	Autonomous Agents and Multi-Agent Systems
Volume \| Issue number	21 \| 1
Pages (from-to)	1-35
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	Temporal difference and evolutionary methods are two of the most common approaches to solving reinforcement learning problems. However, there is little consensus on their relative merits and there have been few empirical studies that directly compare their performance. This article aims to address this shortcoming by presenting results of empirical comparisons between Sarsa and NEAT, two representative methods, in mountain car and keepaway, two benchmark reinforcement learning tasks. In each task, the methods are evaluated in combination with both linear and nonlinear representations to determine their best configurations. In addition, this article tests two specific hypotheses about the critical factors contributing to these methods’ relative performance: (1) that sensor noise reduces the final performance of Sarsa more than that of NEAT, because Sarsa’s learning updates are not reliable in the absence of the Markov property and (2) that stochasticity, by introducing noise in fitness estimates, reduces the learning speed of NEAT more than that of Sarsa. Experiments in variations of mountain car and keepaway designed to isolate these factors confirm both these hypotheses.
Document type	Article
Published at	https://doi.org/10.1007/s10458-009-9100-2
Downloads	307563.pdf (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Critical factors in the empirical performance of temporal difference and evolutionary methods for reinforcement learning