Learning Hierarchical Planning-Based Policies from Offline Data

J. Wöhlke; F. Schmitt; H. van Hoof

doi:https://doi.org/10.1007/978-3-031-43421-1_29

Learning Hierarchical Planning-Based Policies from Offline Data

Authors	J. Wöhlke F. Schmitt H. van Hoof
Publication date	2023
Host editors	D. Koutra C. Plant M. Gomes Rodriguez E. Baralis F. Bonchi
Book title	Machine Learning and Knowledge Discovery in Databases: Research Track
Book subtitle	European Conference, ECML PKDD 2023, Turin, Italy, September 18–22, 2023 : proceedings
ISBN	9783031434204
ISBN (electronic)	9783031434211
Series	Lecture Notes in Computer Science
Event	2023 European Conference on Machine Learning and Knowledge Discovery in Databases
Volume \| Issue number	IV
Pages (from-to)	489–505
Publisher	Cham: Springer
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	Hierarchical policy architectures incorporating some planning component into the top-level have shown superior performance and generalization in agent navigation tasks. Cost or safety reasons may, however, prevent training in an online (RL) fashion with continuous environment interaction. We therefore propose HORIBLe-VRN, an algorithm to learn a hierarchical policy with a top-level planning-based module from pre-collected data. A key challenge is to deal with the unknown, latent high-level (HL) actions. Our algorithm features an EM-style hierarchical imitation learning stage, incorporating HL action inference, and a subsequent offline RL refinement stage for the top-level policy. We empirically evaluate HORIBLe-VRN in a long horizon, sparse reward agent navigation task, investigating performance, generalization capabilities, and robustness with respect to sub-optimal demonstration data.
Document type	Conference contribution
Language	English
Published at	https://doi.org/10.1007/978-3-031-43421-1_29 (Final published version)
Downloads	978-3-031-43421-1_29 (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Learning Hierarchical Planning-Based Policies from Offline Data