Solving Multi-agent MDPs Optimally with Conditional Return Graphs

J. Scharpff; D.M. Roijers; F.A. Oliehoek; M.T.J. Spaan; M.M. de Weerdt

Solving Multi-agent MDPs Optimally with Conditional Return Graphs

Authors	J. Scharpff D.M. Roijers F.A. Oliehoek M.T.J. Spaan M.M. de Weerdt
Publication date	2015
Book title	AAMAS Workshop on Multiagent Sequential Decision Making Under Uncertainty, MSDM 2015
Book subtitle	May 5, 2015 in Istanbul, Turkey : accepted papers
Event	10th AAMAS Workshop on Multi-Agent Sequential Decision Making in Uncertain Domains (MSDM)
Number of pages	8
Publisher	MASplan.org
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	In cooperative multi-agent sequential decision making under uncertainty, agents must coordinate in order find an optimal joint policy that maximises joint value. Typical solution algorithms exploit additive structure in the value function, but in the fully-observable multi-agent MDP setting (MMDP) such structure is not present. We propose a new optimal solver for so-called TI-MMDPs, where agents can only affect their local state, while their value may depend on the state of others. We decompose the returns into local returns per agent that we represent compactly in a conditional return graph (CRG). Using CRGs the value of a joint policy as well as bounds on the value of partially specified joint policies can be efficiently computed. We propose CoRe, a novel branch-and-bound policy search algorithm building on CRGs. CoRe typically requires less runtime than the available alternatives and is able to find solutions to problems previously considered unsolvable.
Document type	Conference contribution
Language	English
Published at	https://www.researchgate.net/publication/275039736_Solving_Multi-agent_MDPs_Optimally_with_Conditional_Return_Graphs (Accepted author manuscript)
Other links	http://masplan.org/msdm2015
Downloads	scharpff2015solving (Accepted author manuscript)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Solving Multi-agent MDPs Optimally with Conditional Return Graphs