Verifiably Safe Exploration for End-to-End Reinforcement Learning

N. Hunt; N. Fulton; S. Magliacane; T.N. Hoang; S. Das; A. Solar-Lezama

doi:https://doi.org/10.1145/3447928.3456653

Verifiably Safe Exploration for End-to-End Reinforcement Learning

Authors	N. Hunt N. Fulton S. Magliacane T.N. Hoang S. Das A. Solar-Lezama
Publication date	2021
Book title	HSCC2021
Book subtitle	proceedings of the 24th International Conference on Hybrid Systems: Computation and Control (part of CPS-IoT Week) : May 19-21, 2021, Nashville, TN, USA
ISBN (electronic)	9781450383394
Event	24th International Conference on Hybrid Systems: Computation and Control
Article number	14
Number of pages	11
Publisher	New York, New York: The Association for Computing Machinery
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	Deploying deep reinforcement learning in safety-critical settings requires developing algorithms that obey hard constraints during exploration. This paper contributes a first approach toward enforcing formal safety constraints on end-to-end policies with visual inputs. Our approach draws on recent advances in object detection and automated reasoning for hybrid dynamical systems. The approach is evaluated on a novel benchmark that emphasizes the challenge of safely exploring in the presence of hard constraints. Our benchmark draws from several proposed problem sets for safe learning and includes problems that emphasize challenges such as reward signals that are not aligned with safety constraints. On each of these benchmark problems, our algorithm completely avoids unsafe behavior while remaining competitive at optimizing for as much reward as is safe. We characterize safety constraints in terms of a refinement relation on Markov decision processes - rather than directly constraining the reinforcement learning algorithm so that it only takes safe actions, we instead refine the environment so that only safe actions are defined in the environment's transition structure. This has pragmatic system design benefits and, more importantly, provides a clean conceptual setting in which we are able to prove important safety and efficiency properties. These allow us to transform the constrained optimization problem of acting safely in the original environment into an unconstrained optimization in a refined environment.
Document type	Conference contribution
Language	English
Published at	https://doi.org/10.1145/3447928.3456653 (Final published version)
Downloads	3447928.3456653 (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Verifiably Safe Exploration for End-to-End Reinforcement Learning