Planning with a Learned Policy Basis to Optimally Solve Complex Tasks

D. Kuric; G. Infante; V. Gómez; A. Jonsson; H. van Hoof

doi:https://doi.org/10.1609/icaps.v34i1.31492

Planning with a Learned Policy Basis to Optimally Solve Complex Tasks

Authors	D. Kuric G. Infante V. Gómez A. Jonsson H. van Hoof
Publication date	2024
Host editors	S. Bernardini C. Muise
Book title	Proceedings of the Thirty-Fourth International Conference on Automated Planning and Scheduling
Book subtitle	June 1–6, 2024, Alberta, Canada
ISBN (electronic)	9781577358893
Series	ICAPS
Event	34th International Conference on Automated Planning and Scheduling, ICAPS 2024
Pages (from-to)	333-341
Publisher	Washington, DC: AAAI Press
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	Conventional reinforcement learning (RL) methods can successfully solve a wide range of sequential decision problems. However, learning policies that can generalize predictably across multiple tasks in a setting with non-Markovian reward specifications is a challenging problem. We propose to use successor features to learn a policy basis so that each (sub)policy in it solves a well-defined subproblem. In a task described by a finite state automaton (FSA) that involves the same set of subproblems, the combination of these (sub)policies can then be used to generate an optimal solution without additional learning. In contrast to other methods that combine (sub)policies via planning, our method asymptotically attains global optimality, even in stochastic environments.
Document type	Conference contribution
Language	English
Published at	https://doi.org/10.1609/icaps.v34i1.31492
Other links	https://www.scopus.com/pages/publications/85195921063
Downloads	Planning with a Learned Policy Basis to Optimally Solve Complex Tasks (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Planning with a Learned Policy Basis to Optimally Solve Complex Tasks