Fast and Data Efficient Reinforcement Learning from Pixels via Non-Parametric Value Approximation

A. Long; A. Blair; H. van Hoof

doi:https://doi.org/10.1609/aaai.v36i7.20728

Fast and Data Efficient Reinforcement Learning from Pixels via Non-Parametric Value Approximation

Authors	A. Long A. Blair H. van Hoof
Publication date	2022
Host editors	K. Sycara V. Honavar M. Spaan
Book title	Proceedings of the 36th AAAI Conference on Artificial Intelligence
Book subtitle	AAAI-22 : virtual conference, Vancouver, Canada, February 22-March 1, 2022
ISBN	9781713855804
ISBN (electronic)	9781577358763
Event	36th AAAI Conference on Artificial Intelligence (AAAI-2022)
Volume \| Issue number	7
Pages (from-to)	7620-7627
Publisher	Palo Alto, California: AAAI Press
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	We present Nonparametric Approximation of Inter-Trace returns (NAIT), a Reinforcement Learning algorithm for discrete action, pixel-based environments that is both highly sample and computation efficient. NAIT is a lazy-learning approach with an update that is equivalent to episodic Monte-Carlo on episode completion, but that allows the stable incorporation of rewards while an episode is ongoing. We make use of a fixed domain-agnostic representation, simple distance based exploration and a proximity graph-based lookup to facilitate extremely fast execution. We empirically evaluate NAIT on both the 26 and 57 game variants of ATARI100k where, despite its simplicity, it achieves competitive performance in the online setting with greater than 100x speedup in wall-time.
Document type	Conference contribution
Language	English
Published at	https://doi.org/10.1609/aaai.v36i7.20728 (Final published version)
Other links	https://www.proceedings.com/64793.html
Downloads	20728-Article Text-24741-1-2-20220628 (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Fast and Data Efficient Reinforcement Learning from Pixels via Non-Parametric Value Approximation