Verifiably Safe Exploration for End-to-End Reinforcement Learning

Open Access
Authors
  • S. Das
  • A. Solar-Lezama
Publication date 2021
Book title HSCC2021
Book subtitle proceedings of the 24th International Conference on Hybrid Systems: Computation and Control (part of CPS-IoT Week) : May 19-21, 2021, Nashville, TN, USA
ISBN (electronic)
  • 9781450383394
Event 24th International Conference on Hybrid Systems: Computation and Control
Article number 14
Number of pages 11
Publisher New York, New York: The Association for Computing Machinery
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
Deploying deep reinforcement learning in safety-critical settings requires developing algorithms that obey hard constraints during exploration. This paper contributes a first approach toward enforcing formal safety constraints on end-to-end policies with visual inputs. Our approach draws on recent advances in object detection and automated reasoning for hybrid dynamical systems. The approach is evaluated on a novel benchmark that emphasizes the challenge of safely exploring in the presence of hard constraints. Our benchmark draws from several proposed problem sets for safe learning and includes problems that emphasize challenges such as reward signals that are not aligned with safety constraints. On each of these benchmark problems, our algorithm completely avoids unsafe behavior while remaining competitive at optimizing for as much reward as is safe. We characterize safety constraints in terms of a refinement relation on Markov decision processes - rather than directly constraining the reinforcement learning algorithm so that it only takes safe actions, we instead refine the environment so that only safe actions are defined in the environment's transition structure. This has pragmatic system design benefits and, more importantly, provides a clean conceptual setting in which we are able to prove important safety and efficiency properties. These allow us to transform the constrained optimization problem of acting safely in the original environment into an unconstrained optimization in a refined environment.
Document type Conference contribution
Language English
Published at https://doi.org/10.1145/3447928.3456653
Downloads
3447928.3456653 (Final published version)
Permalink to this page
Back