# Research

We study a wide range of problems in artificial intelligence, automated planning and learning, autonomous systems, reasoning under uncertainty, multi-agent systems, and resource-bounded reasoning. We are particularly interested in the implications of uncertainty and limited computational resources on the design of autonomous agents. In most practical settings, it is not feasible or desirable to find the optimal action, making it necessary to resort to some form of approximate reasoning. This raises a fundamental question: what does it mean for an agent to be “rational” when it does not have enough knowledge or computational power to derive the best course of action? Our overall approach to this problem involves meta-level control mechanisms that reason explicitly about the cost of decision-making and can optimize the amount of deliberation (or “thinking”) an agent does before taking action. We have also developed new planning techniques for situations involving multiple decision makers operating in either collaborative or adversarial domains.

**Anytime Algorithms**

**Models of Bounded Rationality**

**Scalable Algorithms for Probabilistic Reasoning**

**Belief-Space Planning and POMDPs**

**Multiagent Planning and DEC-POMDPs**

Goldman, Claudia V; Allen, Martin; Zilberstein, Shlomo Decentralized Language Learning Through Acting Conference Proceedings of the 3rd International Conference on Autonomous Agents and Multi Agent Systems (AAMAS), New York, NY, 2004. @conference{SZ:GAZaamas04, title = {Decentralized Language Learning Through Acting}, author = {Claudia V Goldman and Martin Allen and Shlomo Zilberstein}, url = {http://rbr.cs.umass.edu/shlomo/papers/GAZaamas04.pdf}, year = {2004}, date = {2004-01-01}, booktitle = {Proceedings of the 3rd International Conference on Autonomous Agents and Multi Agent Systems (AAMAS)}, pages = {1006--1013}, address = {New York, NY}, abstract = {This paper presents an algorithm for learning the meaning of messages communicated between agents that interact while acting optimally towards a cooperative goal. Our reinforcement-learning method is based on Bayesian filtering and has been adapted for a decentralized control process. Empirical results shed light on the complexity of the learning problem, and on factors affecting the speed of convergence. Designing intelligent agents able to adapt their mutual interpretation of messages exchanged, in order to improve overall task-oriented performance, introduces an essential cognitive capability that can upgrade the current state of the art in multi-agent and human-machine systems to the next level. Learning to communicate while acting will add to the robustness and flexibility of these systems and hence to a more efficient and productive performance.}, keywords = {}, pubstate = {published}, tppubtype = {conference} } This paper presents an algorithm for learning the meaning of messages communicated between agents that interact while acting optimally towards a cooperative goal. Our reinforcement-learning method is based on Bayesian filtering and has been adapted for a decentralized control process. Empirical results shed light on the complexity of the learning problem, and on factors affecting the speed of convergence. Designing intelligent agents able to adapt their mutual interpretation of messages exchanged, in order to improve overall task-oriented performance, introduces an essential cognitive capability that can upgrade the current state of the art in multi-agent and human-machine systems to the next level. Learning to communicate while acting will add to the robustness and flexibility of these systems and hence to a more efficient and productive performance. |

Hansen, Eric A; Bernstein, Daniel S; Zilberstein, Shlomo Dynamic Programming for Partially Observable Stochastic Games Conference Proceedings of the 19th National Conference on Artificial Intelligence (AAAI), San Jose, California, 2004. @conference{SZ:HBZaaai04, title = {Dynamic Programming for Partially Observable Stochastic Games}, author = {Eric A Hansen and Daniel S Bernstein and Shlomo Zilberstein}, url = {http://rbr.cs.umass.edu/shlomo/papers/HBZaaai04.pdf}, year = {2004}, date = {2004-01-01}, booktitle = {Proceedings of the 19th National Conference on Artificial Intelligence (AAAI)}, pages = {709--715}, address = {San Jose, California}, abstract = {We develop an exact dynamic programming algorithm for partially observable stochastic games (POSGs). The algorithm is a synthesis of dynamic programming for partially observable Markov decision processes (POMDPs) and iterated elimination of dominated strategies in normal form games. We prove that when applied to finite-horizon POSGs, the algorithm iteratively eliminates very weakly dominated strategies without first forming a normal form representation of the game. For the special case in which agents share the same payoffs, the algorithm can be used to find an optimal solution. We present preliminary empirical results and discuss ways to further exploit POMDP theory in solving POSGs.}, keywords = {}, pubstate = {published}, tppubtype = {conference} } We develop an exact dynamic programming algorithm for partially observable stochastic games (POSGs). The algorithm is a synthesis of dynamic programming for partially observable Markov decision processes (POMDPs) and iterated elimination of dominated strategies in normal form games. We prove that when applied to finite-horizon POSGs, the algorithm iteratively eliminates very weakly dominated strategies without first forming a normal form representation of the game. For the special case in which agents share the same payoffs, the algorithm can be used to find an optimal solution. We present preliminary empirical results and discuss ways to further exploit POMDP theory in solving POSGs. |

Becker, Raphen; Zilberstein, Shlomo; Lesser, Victor; Goldman, Claudia V Transition-Independent Decentralized Markov Decision Processes Conference Proceedings of the 2nd International Conference on Autonomous Agents and Multi Agent Systems (AAMAS), Melbourne, Australia, 2003. @conference{SZ:BZLGaamas03, title = {Transition-Independent Decentralized Markov Decision Processes}, author = {Raphen Becker and Shlomo Zilberstein and Victor Lesser and Claudia V Goldman}, url = {http://rbr.cs.umass.edu/shlomo/papers/BZLGaamas03.pdf}, doi = {10.1145/860575.860583}, year = {2003}, date = {2003-01-01}, booktitle = {Proceedings of the 2nd International Conference on Autonomous Agents and Multi Agent Systems (AAMAS)}, pages = {41--48}, address = {Melbourne, Australia}, abstract = {There has been substantial progress with formal models for sequential decision making by individual agents using the Markov decision process (MDP). However, similar treatment of multi-agent systems is lacking. A recent complexity result, showing that solving decentralized MDPs is NEXP-hard, provides a partial explanation. To overcome this complexity barrier, we identify a general class of transition-independent decentralized MDPs that is widely applicable. The class consists of independent collaborating agents that are tied together through a global reward function that depends upon both of their histories. We present a novel algorithm for solving this class of problems and examine its properties. The result is the first effective technique to solve optimally a class of decentralized MDPs. This lays the foundation for further work in this area on both exact and approximate solutions.}, keywords = {}, pubstate = {published}, tppubtype = {conference} } There has been substantial progress with formal models for sequential decision making by individual agents using the Markov decision process (MDP). However, similar treatment of multi-agent systems is lacking. A recent complexity result, showing that solving decentralized MDPs is NEXP-hard, provides a partial explanation. To overcome this complexity barrier, we identify a general class of transition-independent decentralized MDPs that is widely applicable. The class consists of independent collaborating agents that are tied together through a global reward function that depends upon both of their histories. We present a novel algorithm for solving this class of problems and examine its properties. The result is the first effective technique to solve optimally a class of decentralized MDPs. This lays the foundation for further work in this area on both exact and approximate solutions. |

Goldman, Claudia V; Zilberstein, Shlomo Optimizing Information Exchange in Cooperative Multi-agent Systems Conference Proceedings of the 2nd International Conference on Autonomous Agents and Multi Agent Systems (AAMAS), Melbourne, Australia, 2003. @conference{SZ:GZaamas03, title = {Optimizing Information Exchange in Cooperative Multi-agent Systems}, author = {Claudia V Goldman and Shlomo Zilberstein}, url = {http://rbr.cs.umass.edu/shlomo/papers/GZaamas03.pdf}, doi = {10.1145/860575.860598}, year = {2003}, date = {2003-01-01}, booktitle = {Proceedings of the 2nd International Conference on Autonomous Agents and Multi Agent Systems (AAMAS)}, pages = {137--144}, address = {Melbourne, Australia}, abstract = {Decentralized control of a cooperative multi-agent system is the problem faced by multiple decision-makers that share a common set of objectives. The decision-makers may be robots placed at separate geographical locations or computational processes distributed in an information space. It may be impossible or undesirable for these decision-makers to share all their knowledge all the time. Furthermore, exchanging information may incur a cost associated with the required bandwidth or with the risk of revealing it to competing agents. Assuming that communication may not be reliable adds another dimension of complexity to the problem. This paper develops a decision-theoretic solution to this problem, treating both standard actions and communication as explicit choices that the decision maker must consider. The goal is to derive both action policies and communication policies that together optimize a global value function. We present an analytical model to evaluate the trade-off between the cost of communication and the value of the information received. Finally, to address the complexity of this hard optimization problem, we develop a practical approximation technique based on myopic meta-level control of communication.}, keywords = {}, pubstate = {published}, tppubtype = {conference} } Decentralized control of a cooperative multi-agent system is the problem faced by multiple decision-makers that share a common set of objectives. The decision-makers may be robots placed at separate geographical locations or computational processes distributed in an information space. It may be impossible or undesirable for these decision-makers to share all their knowledge all the time. Furthermore, exchanging information may incur a cost associated with the required bandwidth or with the risk of revealing it to competing agents. Assuming that communication may not be reliable adds another dimension of complexity to the problem. This paper develops a decision-theoretic solution to this problem, treating both standard actions and communication as explicit choices that the decision maker must consider. The goal is to derive both action policies and communication policies that together optimize a global value function. We present an analytical model to evaluate the trade-off between the cost of communication and the value of the information received. Finally, to address the complexity of this hard optimization problem, we develop a practical approximation technique based on myopic meta-level control of communication. |

Bernstein, Daniel S; Givan, Robert; Immerman, Neil; Zilberstein, Shlomo The Complexity of Decentralized Control of Markov Decision Processes Journal Article Mathematics of Operations Research (MOR), 27 (4), pp. 819–840, 2002. @article{SZ:BGIZmor02, title = {The Complexity of Decentralized Control of Markov Decision Processes}, author = {Daniel S Bernstein and Robert Givan and Neil Immerman and Shlomo Zilberstein}, url = {http://rbr.cs.umass.edu/shlomo/papers/BGIZmor02.pdf}, doi = {10.1287/moor.27.4.819.297}, year = {2002}, date = {2002-01-01}, journal = {Mathematics of Operations Research (MOR)}, volume = {27}, number = {4}, pages = {819--840}, abstract = {We consider decentralized control of Markov decision processes and give complexity bounds on the worst-case running time for algorithms that find optimal solutions. Generalizations of both the fully-observable case and the partially-observable case that allow for decentralized control are described. For even two agents, the finite-horizon problems corresponding to both of these models are hard for non-deterministic exponential time. These complexity results illustrate a fundamental difference between centralized and decentralized control of Markov decision processes. In contrast to the problems involving centralized control, the problems we consider provably do not admit polynomial-time algorithms. Furthermore, assuming EXP =/= NEXP, the problems require super-exponential time to solve in the worst case.}, keywords = {}, pubstate = {published}, tppubtype = {article} } We consider decentralized control of Markov decision processes and give complexity bounds on the worst-case running time for algorithms that find optimal solutions. Generalizations of both the fully-observable case and the partially-observable case that allow for decentralized control are described. For even two agents, the finite-horizon problems corresponding to both of these models are hard for non-deterministic exponential time. These complexity results illustrate a fundamental difference between centralized and decentralized control of Markov decision processes. In contrast to the problems involving centralized control, the problems we consider provably do not admit polynomial-time algorithms. Furthermore, assuming EXP =/= NEXP, the problems require super-exponential time to solve in the worst case. |

Bernstein, Daniel S; Zilberstein, Shlomo Reinforcement Learning for Weakly-Coupled MDPs and an Application to Planetary Rover Control Conference Proceedings of the 6th European Conference on Planning (ECP), Toledo, Spain, 2001. @conference{SZ:BZecp01, title = {Reinforcement Learning for Weakly-Coupled MDPs and an Application to Planetary Rover Control}, author = {Daniel S Bernstein and Shlomo Zilberstein}, url = {http://rbr.cs.umass.edu/shlomo/papers/BZecp01.pdf}, year = {2001}, date = {2001-01-01}, booktitle = {Proceedings of the 6th European Conference on Planning (ECP)}, pages = {373--378}, address = {Toledo, Spain}, abstract = {The progressive processing model allows a system to trade off resource consumption against the quality of the outcome by mapping each activity to a graph of potential solution methods. In the past, only semi-linear graphs have been used. We examine the application of the model to control the operation of an autonomous rover which operates under tight resource constraints. The task structure is generalized to directed acyclic graphs for which the optimal schedule can be computed by solving a corresponding Markov decision problem. We evaluate the complexity of the solution analytically and experimentally and show that it provides a practical approach to building an adaptive controller for this application.}, keywords = {}, pubstate = {published}, tppubtype = {conference} } The progressive processing model allows a system to trade off resource consumption against the quality of the outcome by mapping each activity to a graph of potential solution methods. In the past, only semi-linear graphs have been used. We examine the application of the model to control the operation of an autonomous rover which operates under tight resource constraints. The task structure is generalized to directed acyclic graphs for which the optimal schedule can be computed by solving a corresponding Markov decision problem. We evaluate the complexity of the solution analytically and experimentally and show that it provides a practical approach to building an adaptive controller for this application. |

Xuan, Ping; Lesser, Victor; Shlomo, ; Zilberstein, Communication Decisions in Multiagent Cooperation: Model and Experiments Conference Proceedings of the 5th International Conference on Autonomous Agents (AGENTS), Montreal, Canada, 2001. @conference{SZ:XLZagents01, title = {Communication Decisions in Multiagent Cooperation: Model and Experiments}, author = {Ping Xuan and Victor Lesser and Shlomo and Zilberstein}, url = {http://rbr.cs.umass.edu/shlomo/papers/XLZagents01.pdf}, doi = {10.1145/375735.376469}, year = {2001}, date = {2001-01-01}, booktitle = {Proceedings of the 5th International Conference on Autonomous Agents (AGENTS)}, pages = {616--623}, address = {Montreal, Canada}, abstract = {In multi-agent cooperation, agents share a common goal, which is evaluated through a global utility function. However, an agent typically cannot observe the global state of an uncertain environment, and therefore they must communicate with each other in order to share the information needed for deciding which actions to take. We argue that, when communication incurs a cost (due to resource consumption, for example), whether to communicate or not also becomes a decision to make. Hence, communication decision becomes part of the overall agent decision problem. In order to explicitly address this problem, we present a multi-agent extension to Markov decision processes in which communication can be modeled as an explicit action that incurs a cost. This framework provides a foundation for a quantied study of agent coordination policies and provides both motivation and insight to the design of heuristic approaches. An example problem is studied under this framework. From this example we can see the impact communication policies have on the overall agent policies, and what implications we can nd toward the design of agent coordination policies.}, keywords = {}, pubstate = {published}, tppubtype = {conference} } In multi-agent cooperation, agents share a common goal, which is evaluated through a global utility function. However, an agent typically cannot observe the global state of an uncertain environment, and therefore they must communicate with each other in order to share the information needed for deciding which actions to take. We argue that, when communication incurs a cost (due to resource consumption, for example), whether to communicate or not also becomes a decision to make. Hence, communication decision becomes part of the overall agent decision problem. In order to explicitly address this problem, we present a multi-agent extension to Markov decision processes in which communication can be modeled as an explicit action that incurs a cost. This framework provides a foundation for a quantied study of agent coordination policies and provides both motivation and insight to the design of heuristic approaches. An example problem is studied under this framework. From this example we can see the impact communication policies have on the overall agent policies, and what implications we can nd toward the design of agent coordination policies. |

Bernstein, Daniel S; Zilberstein, Shlomo; Immerman, Neil The Complexity of Decentralized Control of Markov Decision Processes Conference Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence (UAI), Stanford, California, 2000. @conference{SZ:BZIuai00, title = {The Complexity of Decentralized Control of Markov Decision Processes}, author = {Daniel S Bernstein and Shlomo Zilberstein and Neil Immerman}, url = {http://rbr.cs.umass.edu/shlomo/papers/BZIuai00.pdf}, year = {2000}, date = {2000-01-01}, booktitle = {Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence (UAI)}, pages = {32--37}, address = {Stanford, California}, abstract = {Planning for distributed agents with partial state information is considered from a decision-theoretic perspective. We describe generalizations of both the MDP and POMDP models that allow for decentralized control. For even a small number of agents, the finite-horizon problems corresponding to both of our models are complete for nondeterministic exponential time. These complexity results illustrate a fundamental difference between centralized and decentralized control of Markov processes. In contrast to the MDP and POMDP problems, the problems we consider provably do not admit polynomial-time algorithms and most likely require doubly exponential time to solve in the worst case. We have thus provided mathematical evidence corresponding to the intuition that decentralized planning problems cannot easily be reduced to centralized problems and solved exactly using established techniques.}, keywords = {}, pubstate = {published}, tppubtype = {conference} } Planning for distributed agents with partial state information is considered from a decision-theoretic perspective. We describe generalizations of both the MDP and POMDP models that allow for decentralized control. For even a small number of agents, the finite-horizon problems corresponding to both of our models are complete for nondeterministic exponential time. These complexity results illustrate a fundamental difference between centralized and decentralized control of Markov processes. In contrast to the MDP and POMDP problems, the problems we consider provably do not admit polynomial-time algorithms and most likely require doubly exponential time to solve in the worst case. We have thus provided mathematical evidence corresponding to the intuition that decentralized planning problems cannot easily be reduced to centralized problems and solved exactly using established techniques. |

Xuan, Ping; Lesser, Victor R; Zilberstein, Shlomo Communication in Multi-Agent Markov Decision Processes Conference Proceedings of the 4th International Conference on Multi-Agent Systems, Boston, Massachusetts, 2000. @conference{SZ:XLZicmas00, title = {Communication in Multi-Agent Markov Decision Processes}, author = {Ping Xuan and Victor R Lesser and Shlomo Zilberstein}, url = {https://doi.org/10.1109/ICMAS.2000.858528}, doi = {10.1109/ICMAS.2000.858528}, year = {2000}, date = {2000-01-01}, booktitle = {Proceedings of the 4th International Conference on Multi-Agent Systems}, pages = {467--468}, address = {Boston, Massachusetts}, abstract = {In this paper, we formulate agent's decision process under the framework of Markov decision processes, and in particular, the multi-agent extension to Markov decision process that includes agent communication decisions. We model communication as the way for each agent to obtain local state information in other agents, by paying a certain communication cost. Thus, agents have to decide not only which local action to perform, but also whether it is worthwhile to perform a communication action before deciding the local action. We believe that this would provide a foundation for formal study of coordination activities and may lead to some insights to the design of agent coordination policies, and heuristic approaches in particular. An example problem is studied under this framework and its implications to coordination are discussed.}, keywords = {}, pubstate = {published}, tppubtype = {conference} } In this paper, we formulate agent's decision process under the framework of Markov decision processes, and in particular, the multi-agent extension to Markov decision process that includes agent communication decisions. We model communication as the way for each agent to obtain local state information in other agents, by paying a certain communication cost. Thus, agents have to decide not only which local action to perform, but also whether it is worthwhile to perform a communication action before deciding the local action. We believe that this would provide a foundation for formal study of coordination activities and may lead to some insights to the design of agent coordination policies, and heuristic approaches in particular. An example problem is studied under this framework and its implications to coordination are discussed. |