Planning with a Learned Policy Basis to Optimally Solve Complex Tasks

Open Access
Authors
  • D. Kuric ORCID logo
  • G. Infante
  • V. Gómez
  • A. Jonsson
Publication date 2024
Host editors
  • S. Bernardini
  • C. Muise
Book title Proceedings of the Thirty-Fourth International Conference on Automated Planning and Scheduling
Book subtitle June 1–6, 2024, Alberta, Canada
ISBN (electronic)
  • 9781577358893
Series ICAPS
Event 34th International Conference on Automated Planning and Scheduling, ICAPS 2024
Pages (from-to) 333-341
Publisher Washington, DC: AAAI Press
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
Conventional reinforcement learning (RL) methods can successfully solve a wide range of sequential decision problems. However, learning policies that can generalize predictably across multiple tasks in a setting with non-Markovian reward specifications is a challenging problem. We propose to use successor features to learn a policy basis so that each (sub)policy in it solves a well-defined subproblem. In a task described by a finite state automaton (FSA) that involves the same set of subproblems, the combination of these (sub)policies can then be used to generate an optimal solution without additional learning. In contrast to other methods that combine (sub)policies via planning, our method asymptotically attains global optimality, even in stochastic environments.
Document type Conference contribution
Language English
Published at https://doi.org/10.1609/icaps.v34i1.31492
Other links https://www.scopus.com/pages/publications/85195921063
Downloads
Permalink to this page
Back