Optimal and approximate Q-value functions for decentralized POMDPs
Journal of Artificial Intelligence Research
Faculty of Science (FNWI)
Informatics Institute (IVI)
Decision-theoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in
sensing and acting in a principled way. In single-agent frameworks like MDPs and POMDPs, planning can be carried out by resorting
to Q-value functions: an optimal Q-value function Q* is computed in a recursive manner by dynamic programming, and then an
optimal policy is extracted from Q*. In this paper we study whether similar Q-value functions can be defined for decentralized
POMDP models (Dec- POMDPs), and how policies can be extracted from such value functions. We define two forms of the optimal
Q-value function for Dec-POMDPs: one that gives a normative description as the Q-value function of an optimal pure joint policy
and another one that is sequentially rational and thus gives a recipe for computation. This computation, however, is infeasible
for all but the smallest problems. Therefore, we analyze various approximate Q-value functions that allow for efficient computation.
We describe how they relate, and we prove that they all provide an upper bound to the optimal Q-value function Q*. Finally,
unifying some previous approaches for solving Dec-POMDPs, we describe a family of algorithms for extracting policies from
such Q-value functions, and perform an experimental evaluation on existing test problems, including a new firefighting benchmark
If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let
the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible
and/or remove it from the website. Please Ask the Library, or send a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands.
You will be contacted as soon as possible.