Optimal and approximate Q-value functions for decentralized POMDPs

doi:https://doi.org/10.1613/jair.2447

Optimal and approximate Q-value functions for decentralized POMDPs

Authors	F.A. Oliehoek M.T.J. Spaan N. Vlassis
Publication date	05-2008
Journal	Journal of Artificial Intelligence Research
Volume \| Issue number	32
Pages (from-to)	289-353
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	Decision-theoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in sensing and acting in a principled way. In single-agent frameworks like MDPs and POMDPs, planning can be carried out by resorting to Q-value functions: an optimal Q-value function Q* is computed in a recursive manner by dynamic programming, and then an optimal policy is extracted from Q. In this paper we study whether similar Q-value functions can be defined for decentralized POMDP models (Dec- POMDPs), and how policies can be extracted from such value functions. We define two forms of the optimal Q-value function for Dec-POMDPs: one that gives a normative description as the Q-value function of an optimal pure joint policy and another one that is sequentially rational and thus gives a recipe for computation. This computation, however, is infeasible for all but the smallest problems. Therefore, we analyze various approximate Q-value functions that allow for efficient computation. We describe how they relate, and we prove that they all provide an upper bound to the optimal Q-value function Q. Finally, unifying some previous approaches for solving Dec-POMDPs, we describe a family of algorithms for extracting policies from such Q-value functions, and perform an experimental evaluation on existing test problems, including a new firefighting benchmark problem.
Document type	Article
Language	English
Published at	https://doi.org/10.1613/jair.2447
Downloads	281001.pdf (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Optimal and approximate Q-value functions for decentralized POMDPs