Analyzing Probabilistic Logic Shields for Multi-Agent Reinforcement Learning

Satchit Chatterji; Erman Acar

doi:https://doi.org/10.48550/arXiv.2411.04867

Analyzing Probabilistic Logic Shields for Multi-Agent Reinforcement Learning

Authors	Satchit Chatterji Erman Acar
Publication date	2025
Host editors	Inês Lynce Nello Murano Mauro Vallati Serena Villata Federico Chesani Michela Milano Andrea Omicini Mehdi Dastani
Book title	ECAI 2025
Book subtitle	28th European Conference on Artificial Intelligence, 25-30 October2025, Bologna, Italy : including 14th Conference on Prestigious Applications of Intelligent Systems (PAIS 2025) : proceedings
ISBN (electronic)	9781643686318
Series	Frontiers in Artificial Intelligence and Applications
Event	28th European Conference on Artificial Intelligence, ECAI 2025, including 14th Conference on Prestigious Applications of Intelligent Systems, PAIS 2025
Pages (from-to)	2538-2545
Number of pages	8
Publisher	Amsterdam: IOS Press
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI) Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract	Safe reinforcement learning (RL) is crucial for real-world applications, and multi-agent interactions introduce additional safety challenges. While Probabilistic Logic Shields (PLS) has been a powerful proposal to enforce safety in single-agent RL, their generalizability to multi-agent settings remains unexplored. In this paper, we address this gap by conducting extensive analyses of PLS within decentralized, multi-agent environments, and in doing so, propose Shielded Multi-Agent Reinforcement Learning (SMARL) as a general framework for steering MARL towards norm-compliant outcomes. Our key contributions are: (1) a novel Probabilistic Logic Temporal Difference (PLTD) update for shielded, independent Q-learning, which incorporates probabilistic constraints directly into the value update process; (2) a probabilistic logic policy gradient method for shielded PPO with formal safety guarantees for MARL; and (3) comprehensive evaluation across symmetric and asymmetrically shielded n-player game-theoretic benchmarks, demonstrating fewer constraint violations and significantly better cooperation under normative constraints. These results position SMARL as an effective mechanism for equilibrium selection, paving the way toward safer, socially aligned multi-agent systems.
Document type	Conference contribution
Language	English
Published at	https://doi.org/10.48550/arXiv.2411.04867 (Accepted author manuscript) https://doi.org/10.3233/FAIA251103 (Final published version)
Other links	http://adsabs.harvard.edu/abs/2024arXiv241104867C https://www.scopus.com/pages/publications/105024464849
Downloads	FAIA-413-FAIA251103 (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Analyzing Probabilistic Logic Shields for Multi-Agent Reinforcement Learning