Fast and Data Efficient Reinforcement Learning from Pixels via Non-Parametric Value Approximation

Open Access
Authors
Publication date 2022
Host editors
  • K. Sycara
  • V. Honavar
  • M. Spaan
Book title Proceedings of the 36th AAAI Conference on Artificial Intelligence
Book subtitle AAAI-22 : virtual conference, Vancouver, Canada, February 22-March 1, 2022
ISBN
  • 9781713855804
ISBN (electronic)
  • 9781577358763
Event 36th AAAI Conference on Artificial Intelligence (AAAI-2022)
Volume | Issue number 7
Pages (from-to) 7620-7627
Publisher Palo Alto, California: AAAI Press
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
We present Nonparametric Approximation of Inter-Trace returns (NAIT), a Reinforcement Learning algorithm for discrete action, pixel-based environments that is both highly sample and computation efficient. NAIT is a lazy-learning approach with an update that is equivalent to episodic Monte-Carlo on episode completion, but that allows the stable incorporation of rewards while an episode is ongoing. We make use of a fixed domain-agnostic representation, simple distance based exploration and a proximity graph-based lookup to facilitate extremely fast execution. We empirically evaluate NAIT on both the 26 and 57 game variants of ATARI100k where, despite its simplicity, it achieves competitive performance in the online setting with greater than 100x speedup in wall-time.
Document type Conference contribution
Language English
Published at https://doi.org/10.1609/aaai.v36i7.20728
Other links https://www.proceedings.com/64793.html
Downloads
20728-Article Text-24741-1-2-20220628 (Final published version)
Permalink to this page
Back