Research
We study a wide range of problems in artificial intelligence, automated planning and learning, autonomous systems, reasoning under uncertainty, multi-agent systems, and resource-bounded reasoning. We are particularly interested in the implications of uncertainty and limited computational resources on the design of autonomous agents. In most practical settings, it is not feasible or desirable to find the optimal action, making it necessary to resort to some form of approximate reasoning. This raises a fundamental question: what does it mean for an agent to be “rational” when it does not have enough knowledge or computational power to derive the best course of action? Our overall approach to this problem involves meta-level control mechanisms that reason explicitly about the cost of decision-making and can optimize the amount of deliberation (or “thinking”) an agent does before taking action. We have also developed new planning techniques for situations involving multiple decision makers operating in either collaborative or adversarial domains.
Human Compatible AI
Mahmud, Saaduddin; Nashed, Samer B.; Goldman, Claudia V.; Zilberstein, Shlomo Estimating Causal Responsibility for Explaining Autonomous Behavior Book Section In: Calvaresi, Davide (Ed.): International Workshop on Explainable and Transparent AI and Multi-Agent Systems (EXTRAAMAS), pp. 78–94, Springer, 2023. @incollection{SZ:MNGZextraamas23, There has been growing interest in causal explanations of stochastic, sequential decision-making systems. Structural causal models and causal reasoning offer several theoretical benefits when exact inference can be applied. Furthermore, users overwhelmingly prefer the resulting causal explanations over other state-of-the-art systems. In this work, we focus on one such method, MeanRESP, and its approximate versions that drastically reduce compute load and assign a responsibility score to each variable, which helps identify smaller sets of causes to be used as explanations. However, this method, and its approximate versions in particular, lack deeper theoretical analysis and broader empirical tests. To address these shortcomings, we provide three primary contributions. First, we offer several theoretical insights on the sample complexity and error rate of approximate MeanRESP. Second, we discuss several automated metrics for comparing explanations generated from approximate methods to those generated via exact methods. While we recognize the significance of user studies as the gold standard for evaluating explanations, our aim is to leverage the proposed metrics to systematically compare explanation-generation methods along important quantitative dimensions. Finally, we provide a more detailed discussion of MeanRESP and how its output under different definitions of responsibility compares to existing widely adopted methods that use Shapley values. |
Parr, Shane; Khatri, Ishan; Svegliato, Justin; Zilberstein, Shlomo Agent-Aware State Estimation in Autonomous Vehicles Conference Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 2021. @conference{SZ:PKSZiros21, Autonomous systems often operate in environments where the behavior of multiple agents is coordinated by a shared global state. Reliable estimation of the global state is thus critical for successfully operating in a multi-agent setting. We introduce agent-aware state estimation--a framework for calculating indirect estimations of state given observations of the behavior of other agents in the environment. We also introduce transition-independent agent-aware state estimation--a tractable class of agent-aware state estimation--and show that it allows the speed of inference to scale linearly with the number of agents in the environment. As an example, we model traffic light classification in instances of complete loss of direct observation. By taking into account observations of vehicular behavior from multiple directions of traffic, our approach exhibits accuracy higher than that of existing traffic light-only HMM methods on a real-world autonomous vehicle data set under a variety of simulated occlusion scenarios. |
Wu, Feng; Zilberstein, Shlomo; Jennings, Nicholas R Multi-Agent Planning with High-Level Human Guidance Conference Proceedings of Principles and Practice of Multi-Agent Systems (PRIMA), 2020. @conference{SZ:WZJprima20, Planning and coordination of multiple agents in the presence of uncertainty and noisy sensors is extremely hard. A human operator who observes a multi-agent team can provide valuable guidance to the team based on her superior ability to interpret observations and assess the overall situation. We propose an extension of decentralized POMDPs that allows such human guidance to be factored into the planning and execution processes. Human guidance in our framework consists of intuitive high-level commands that the agents must translate into a suitable joint plan that is sensitive to what they know from local observations. The result is a framework that allows multi-agent systems to benefit from the complex strategic thinking of a human supervising them. We evaluate this approach on several common benchmark problems and show that it can lead to dramatic improvement in performance. |
Wu, Feng; Zilberstein, Shlomo; Jennings, Nicholas R Stochastic Multi-agent Planning with Partial State Models Conference Proceedings of the First International Conference on Distributed Artificial Intelligence (DAI), Beijing, China, 2019. @conference{SZ:WZJdai19, People who observe a multi-agent team can often provide valuable information to the agents based on their superior cognitive abilities to interpret sequences of observations and assess the overall situation. The knowledge they possess is often difficult to be fully represented using a formal model such as DEC-POMDP. To deal with this, we propose an extension of the DEC-POMDP that allows states to be partially specified and benefit from expert knowledge, while preserving the partial observability and decentralized operation of the agents. In particular, we present an algorithm for computing policies based on history samples that include human labeled data in the form of reward reshaping. We also consider ways to minimize the burden on human experts during the labeling phase. The results offer the first approach to incorporating human knowledge in such complex multi-agent settings. We demonstrate the benefits of our approach using a disaster recovery scenario, comparing it to several baseline approaches. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Privacy-Preserving Policy Iteration for Decentralized POMDPs Conference Proceedings of the 32nd Conference on Artificial Intelligence (AAAI), New Orleans, Louisiana, 2018. @conference{SZ:WZCaaai18, We propose the first privacy-preserving approach to address the privacy issues that arise in multi-agent planning problems modeled as a Dec-POMDP. Our solution is a distributed message-passing algorithm based on trials, where the agents' policies are optimized using the cross-entropy method. In our algorithm, the agents' private information is protected using a public-key homomorphic cryptosystem. We prove the correctness of our algorithm and analyze its complexity in terms of message passing and encryption/decryption operations. Furthermore, we analyze several privacy aspects of our algorithm and show that it can preserve the agent privacy of non-neighbors, model privacy, and decision privacy. Our experimental results on several common Dec-POMDP bench- mark problems confirm the effectiveness of our approach. |
Wray, Kyle Hollins; Kumar, Akshat; Zilberstein, Shlomo Integrated Cooperation and Competition in Multi-Agent Decision-Making Conference Proceedings of the 32nd Conference on Artificial Intelligence (AAAI), New Orleans, Louisiana, 2018. @conference{SZ:WKZaaai18, Observing that many real-world sequential decision problems are not purely cooperative or purely competitive, we propose a new model--cooperative-competitive process (CCP)--that can simultaneously encapsulate both cooperation and competition. First, we discuss how the CCP model bridges the gap between cooperative and competitive models. Next, we investigate a specific class of group-dominant CCPs, in which agents cooperate to achieve a common goal as their primary objective, while also pursuing individual goals as a secondary objective. We provide an approximate solution for this class of problems that leverages stochastic finite-state controllers. The model is grounded in two multi-robot meeting and box-pushing domains that are implemented in simulation and demonstrated on two real robots. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Multi-Agent Planning with Baseline Regret Minimization Conference Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI), 2017. @conference{SZ:WZCijcai17, We propose a novel baseline regret minimization algorithm for multi-agent planning problems modeled as finite-horizon decentralized POMDPs. It guarantees to produce a policy that is provably at least as good as a given baseline policy. We also propose an iterative belief generation algorithm to efficiently minimize the baseline regret, which only requires necessary iterations so as to converge to the policy with minimum baseline regret. Experimental results on common benchmark problems confirm the benefits of the algorithm compared with the state-of-the-art approaches. |
Kumar, Akshat; Mostafa, Hala; Zilberstein, Shlomo Dual Formulations for Optimizing Dec-POMDP Controllers Conference Proceedings of the 26th International Conference on Automated Planning and Scheduling (ICAPS), London, UK, 2016. @conference{SZ:KMZicaps16, Decentralized POMDP is an expressive model for multiagent planning. Finite-state controllers (FSCs)--often used to represent policies for infinite-horizon problems---offer a compact, simple-to-execute policy representation. We exploit novel connections between optimizing decentralized FSCs and the dual linear program for MDPs. Consequently, we describe a dual mixed integer linear program (MIP) for optimizing deterministic FSCs. We exploit the Dec-POMDP structure to devise a compact MIP and formulate constraints that result in policies executable in partially-observable decentralized settings. We show analytically that the dual formulation can also be exploited within the expectation maximization (EM) framework to optimize stochastic FSCs. The resulting EM algorithm can be implemented by solving a sequence of linear programs, without requiring expensive message passing over the Dec-POMDP DBN. We also present an efficient technique for policy improvement based on a weighted entropy measure. Compared with state-of-the-art FSC methods, our approach offers over an order-of-magnitude speedup, while producing similar or better solutions. |
Kumar, Akshat; Zilberstein, Shlomo; Toussaint, Marc Probabilistic Inference Techniques for Scalable Multiagent Decision Making Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 53, pp. 223–270, 2015. @article{SZ:KZTjair15, Decentralized POMDPs provide an expressive framework for multiagent sequential decision making. However, the complexity of these models -- NEXP-Complete even for two agents -- has limited their scalability. We present a promising new class of approximation algorithms by developing novel connections between multiagent planning and machine learning. We show how the multiagent planning problem can be reformulated as inference in a mixture of dynamic Bayesian networks (DBNs). This planning-as-inference approach paves the way for the application of efficient inference techniques in DBNs to multiagent decision making. To further improve scalability, we identify certain conditions that are sufficient to extend the approach to multiagent systems with dozens of agents. Specifically, we show that the necessary inference within the expectation-maximization framework can be decomposed into processes that often involve a small subset of agents, thereby facilitating scalability. We further show that a number of existing multiagent planning models satisfy these conditions. Experiments on large planning benchmarks confirm the benefits of our approach in terms of runtime and scalability with respect to existing techniques. |
Nguyen, Duc Thien; Yeoh, William; Lau, Hoong Chuin; Zilberstein, Shlomo; Zhang, Chongjie Decentralized Multi-Agent Reinforcement Learning in Average-Reward Dynamic DCOPs Conference Proceedings of the 28th Conference on Artificial Intelligence (AAAI), Quebec City, Canada, 2014. @conference{SZ:NYLZZaaai14, Researchers have introduced the Dynamic Distributed Constraint Optimization Problem (Dynamic DCOP) formulation to model dynamically changing multi-agent coordination problems, where a dynamic DCOP is a sequence of (static canonical) DCOPs, each partially different from the DCOP preceding it. Existing work typically assumes that the problem in each time step is decoupled from the problems in other time steps, which might not hold in some applications. Therefore, in this paper, we make the following contributions: (i) We introduce a new model, called Markovian Dynamic DCOPs (MD-DCOPs), where the DCOP in the next time step is a function of the value assignments in the current time step; (ii) We introduce two distributed reinforcement learning algorithms, the Distributed RVI Q-learning algorithm and the Distributed R-learning algorithm, that balance exploration and exploitation to solve MD-DCOPs in an online manner; and (iii) We empirically evaluate them against an existing multiarm bandit DCOP algorithm on dynamic DCOPs. |
Brafman, Ronen I; Shani, Guy; Zilberstein, Shlomo Qualitative Planning under Partial Observability in Multi-Agent Domains Conference Proceedings of the 27th Conference on Artificial Intelligence (AAAI), Bellevue, Washington, 2013. @conference{SZ:BSZaaai13, Decentralized POMDPs (Dec-POMDPs) provide a rich, attractive model for planning under uncertainty and partial observability in cooperative multi-agent domains with a growing body of research. In this paper we formulate a qualitative, propositional model for multi-agent planning under uncertainty with partial observability, which we call Qualitative Dec-POMDP (QDec-POMDP). We show that the worst-case complexity of planning in QDec-POMDPs is similar to that of Dec-POMDPs. Still, because the model is more "classical" in nature, it is more compact and easier to specify. Furthermore, it eases the adaptation of methods used in classical and contingent planning to solve problems that challenge current Dec-POMDPs solvers. In particular, in this paper we describe a method based on compilation to classical planning, which handles multi-agent planning problems significantly larger than those handled by current Dec-POMDP algorithms. |
Wu, Feng; Zilberstein, Shlomo; Jennings, Nicholas R Monte-Carlo Expectation Maximization for Decentralized POMDPs Conference Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI), Beijing, China, 2013. @conference{SZ:WZJijcai13, We address two significant drawbacks of state-of-the-art solvers of decentralized POMDPs (DEC-POMDPs): the reliance on complete knowledge of the model and limited scalability as the complexity of the domain grows. We extend a recently proposed approach for solving DEC-POMDPs via a reduction to the maximum likelihood problem, which in turn can be solved using EM. We introduce a model-free version of this approach that employs Monte-Carlo EM (MCEM). While a naive implementation of MCEM is inadequate in multi-agent settings, we introduce several improvements in sampling that produce high-quality results on a variety of DEC-POMDP benchmarks, including large problems with thousands of agents. |
Yeoh, William; Kumar, Akshat; Zilberstein, Shlomo Automated Generation of Interaction Graphs for Value-Factored Dec-POMDPs Conference Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI), Beijing, China, 2013. @conference{SZ:YKZijcai13, The Decentralized Partially Observable Markov Decision Process (Dec-POMDP) is a powerful model for multiagent planning under uncertainty, but its applicability is hindered by its high complexity -- solving Dec-POMDPs optimally is NEXP-hard. Recently, Kumar et al. introduced the Value Factorization (VF) framework, which exploits decomposable value functions that can be factored into subfunctions. This framework has been shown to be a generalization of several models that leverage sparse agent interactions such as TI-Dec-MDPs, ND-POMDPs and TD-POMDPs. Existing algorithms for these models assume that the interaction graph of the problem is given. In this paper, we introduce three algorithms to automatically generate interaction graphs for models within the VF framework and establish lower and upper bounds on the expected reward of an optimal joint policy. We illustrate experimentally the benefits of these techniques for sensor placement in a decentralized tracking application. |
Durfee, Edmund; Zilberstein, Shlomo Multiagent Planning, Control, and Execution Book Section In: Weiss, G (Ed.): Multiagent Systems, Second Edition, pp. 485–546, MIT Press, Cambridge, MA, USA, 2013. @incollection{SZ:DZmultiagent13, |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Online Planning for Multi-Agent Systems with Bounded Communication Journal Article In: Artificial Intelligence (AIJ), vol. 175, no. 2, pp. 487–511, 2011. @article{SZ:WZCaij11, We propose an online algorithm for planning under uncertainty in multi-agent settings modeled as DEC-POMDPs. The algorithm helps overcome the high computational complexity of solving such problems offline. The key challenges in decentralized operation are to maintain coordinated behavior with little or no communication and, when communication is allowed, to optimize value with minimal communication. The algorithm addresses these challenges by generating identical conditional plans based on common knowledge and communicating only when history inconsistency is detected, allowing communication to be postponed when necessary. To be suitable for online operation, the algorithm computes good local policies using a new and fast local search method implemented using linear programming. Moreover, it bounds the amount of memory used at each step and can be applied to problems with arbitrary horizons. The experimental results confirm that the algorithm can solve problems that are too large for the best existing offline planning algorithms and it outperforms the best online method, producing much higher value with much less communication in most cases. The algorithm also proves to be effective when the communication channel is imperfect (periodically unavailable). These results contribute to the scalability of decision-theoretic planning in multi-agent settings. |
Kumar, Akshat; Zilberstein, Shlomo Message-Passing Algorithms for Large Structured Decentralized POMDPs Conference Proceedings of the 10th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Taipei, Taiwan, 2011. @conference{SZ:KZaamas11, Anytime algorithms allow a system to trade solution quality for computation time. In previous work, monitoring techniques have been developed to allow agents to stop the computation at the "right" time so as to optimize a given time-dependent utility function. However, these results apply only to the single-agent case. In this paper we analyze the problems that arise when several agents solve components of a larger problem, each using an anytime algorithm. Monitoring in this case is more challenging as each agent is uncertain about the progress made so far by the others. We develop a formal framework for decentralized monitoring, establish the complexity of several interesting variants of the problem, and propose solution techniques for each one. Finally, we show that the framework can be applied to decentralized flow and planning problems. |
Kumar, Akshat; Zilberstein, Shlomo; Toussaint, Marc Scalable Multiagent Planning Using Probabilistic Inference Conference Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI), Barcelona, Spain, 2011. @conference{SZ:KZTijcai11, Multiagent planning has seen much progress with the development of formal models such as Dec-POMDPs. However, the complexity of these models -- NEXP-Complete even for two agents -- has limited scalability. We identify certain mild conditions that are sufficient to make multiagent planning amenable to a scalable approximation w.r.t. the number of agents. This is achieved by constructing a graphical model in which likelihood maximization is equivalent to plan optimization. Using the Expectation-Maximization framework for likelihood maximization, we show that the necessary inference can be decomposed into processes that often involve a small subset of agents, thereby facilitating scalability. We derive a global update rule that combines these local inferences to monotonically increase the overall solution quality. Experiments on a large multiagent planning benchmark confirm the benefits of the new approach in terms of runtime and scalability. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Online Planning for Ad Hoc Autonomous Agent Teams Conference Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI), Barcelona, Spain, 2011. @conference{SZ:WZCijcai11, We propose a novel online planning algorithm for ad hoc team settings -- challenging situations in which an agent must collaborate with unknown teammates without prior coordination. Our approach is based on constructing and solving a series of stage games, and then using biased adaptive play to choose actions. The utility function in each stage game is estimated via Monte-Carlo tree search using the UCT algorithm. We establish analytically the convergence of the algorithm and show that it performs well in a variety of ad hoc team domains. |
Amato, Christopher; Bernstein, Daniel S; Zilberstein, Shlomo Optimizing Fixed-Size Stochastic Controllers for POMDPs and Decentralized POMDPs Journal Article In: Autonomous Agents and Multi-Agent Systems (JAAMAS), vol. 21, no. 3, pp. 293–320, 2010. @article{SZ:ABZjaamas10, Coordination of distributed agents is required for problems arising in many areas, including multi-robot systems, networking and e-commerce. As a formal framework for such problems, we use the decentralized partially observable Markov decision process (DEC-POMDP). Though much work has been done on optimal dynamic programming algorithms for the single-agent version of the problem, optimal algorithms for the multiagent case have been elusive. The main contribution of this paper is an optimal policy iteration algorithm for solving DEC-POMDPs. The algorithm uses stochastic finite-state controllers to represent policies. The solution can include a correlation device, which allows agents to correlate their actions without communicating. This approach alternates between expanding the controller and performing value-preserving transformations, which modify the controller without sacrificing value. We present two Efficient value-preserving transformations: one can reduce the size of the controller and the other can improve its value while keeping the size fixed. Empirical results demonstrate the usefulness of value-preserving transformations in increasing value while keeping controller size to a minimum. To broaden the applicability of the approach, we also present a heuristic version of the policy iteration algorithm, which sacrifices convergence to optimality. This algorithm further reduces the size of the controllers at each step by assuming that probability distributions over the other agents' actions are known. While this assumption may not hold in general, it helps produce higher quality solutions in our test problems. |
Kumar, Akshat; Zilberstein, Shlomo Point-Based Backup for Decentralized POMDPs: Complexity and New Algorithms Conference Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Toronto, Canada, 2010. @conference{SZ:KZaamas10, Decentralized POMDPs provide an expressive framework for sequential multi-agent decision making. Despite their high complexity, there has been significant progress in scaling up existing algorithms, largely due to the use of point-based methods. Performing point-based backup is a fundamental operation in state-of-the-art algorithms. We show that even a single backup step in the multi-agent setting is NP-Complete. Despite this negative worst-case result, we present an efficient and scalable optimal algorithm as well as a principled approximation scheme. The optimal algorithm exploits recent advances in the weighted CSP literature to overcome the complexity of the backup operation. The polytime approximation scheme provides a constant factor approximation guarantee based on the number of belief points. In experiments on standard domains, the optimal approach provides significant speedup (up to 2 orders of magnitude) over the previous best optimal algorithm and is able to increase the number of belief points by more than a factor of 3. The approximation scheme also works well in practice, providing near-optimal solutions to the backup problem. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Point-Based Policy Generation for Decentralized POMDPs Conference Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Toronto, Canada, 2010. @conference{SZ:WZCaamas10, Memory-bounded techniques have shown great promise in solving complex multi-agent planning problems modeled as DEC-POMDPs. Much of the performance gains can be attributed to pruning techniques that alleviate the complexity of the exhaustive backup step of the original MBDP algorithm. Despite these improvements, state-of-the-art algorithms can still handle a relative small pool of candidate policies, which limits the quality of the solution in some benchmark problems. We present a new algorithm, Point-Based Policy Generation, which avoids altogether searching the entire joint policy space. The key observation is that the best joint policy for each reachable belief state can be constructed directly, instead of producing first a large set of candidates. We also provide an efficient approximate implementation of this operation. The experimental results show that our solution technique improves the performance significantly in terms of both runtime and solution quality. |
Kumar, Akshat; Zilberstein, Shlomo Anytime Planning for Decentralized POMDPs using Expectation Maximization Conference Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI), Catalina Island, California, 2010. @conference{SZ:KZuai10, Decentralized POMDPs provide an expressive framework for multi-agent sequential decision making. While finite-horizon DEC-POMDPs have enjoyed significant success, progress remains slow for the infinite-horizon case mainly due to the inherent complexity of optimizing stochastic controllers representing agent policies. We present a promising new class of algorithms for the infinite-horizon case, which recasts the optimization problem as inference in a mixture of DBNs. An attractive feature of this approach is the straightforward adoption of existing inference techniques in DBNs for solving DEC-POMDPs and supporting richer representations such as factored or continuous states and actions. We also derive the Expectation Maximization (EM) algorithm to optimize the joint policy represented as DBNs. Experiments on benchmark domains show that EM compares favorably against the state-of-the-art solvers. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Rollout Sampling Policy Iteration for Decentralized POMDPs Conference Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI), Catalina Island, California, 2010. @conference{SZ:WZCuai10, We present decentralized rollout sampling policy iteration (DecRSPI)--a new algorithm for multiagent decision problems formalized as DEC-POMDPs. DecRSPI is designed to improve scalability and tackle problems that lack an explicit model. The algorithm uses Monte-Carlo methods to generate a sample of reachable belief states. Then it computes a joint policy for each belief state based on the rollout estimations. A new policy representation allows us to represent solutions compactly. The key benefits of the algorithm are its linear time complexity over the number of agents, its bounded memory usage and good solution quality. It can solve larger problems that are intractable for existing planning algorithms. Experimental results confirm the effectiveness and scalability of the approach. |
Amato, Christopher; Bonet, Blai; Zilberstein, Shlomo Finite-State Controllers Based on Mealy Machines for Centralized and Decentralized POMDPs Conference Proceedings of the 24th Conference on Artificial Intelligence (AAAI), Atlanta, Georgia, 2010. @conference{SZ:ABZaaai10, Existing controller-based approaches for centralized and decentralized POMDPs are based on automata with output known as Moore machines. In this paper, we show that several advantages can be gained by utilizing another type of automata, the Mealy machine. Mealy machines are more powerful than Moore machines, provide a richer structure that can be exploited by solution methods, and can be easily incorporated into current controller-based approaches. To demonstrate this, we adapted some existing controller-based algorithms to use Mealy machines and obtained results on a set of benchmark domains. The Mealy-based approach always outperformed the Moore-based approach and often outperformed the state-of-the-art algorithms for both centralized and decentralized POMDPs. These findings provide fresh and general insights for the improvement of existing algorithms and the development of new ones. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Trial-Based Dynamic Programming for Multi-Agent Planning Conference Proceedings of the 24th Conference on Artificial Intelligence (AAAI), Atlanta, Georgia, 2010. @conference{SZ:WZCaaai10, Trial-based approaches offer an efficient way to solve single-agent MDPs and POMDPs. These approaches allow agents to focus their computations on regions of the environment they encounter during the trials, leading to significant computational savings. We present a novel trial-based dynamic programming (TBDP) algorithm for DEC-POMDPs that extends these benefits to multi-agent settings. The algorithm uses trial-based methods for both belief generation and policy evaluation. Policy improvement is implemented efficiently using linear programming and a sub-policy reuse technique that helps bound the amount of memory. The results show that TBDP can produce significant value improvements and is much faster than the best existing planning algorithms. |
Bernstein, Daniel S; Amato, Christopher; Hansen, Eric A; Zilberstein, Shlomo Policy Iteration for Decentralized Control of Markov Decision Processes Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 34, pp. 89–132, 2009. @article{SZ:BAHZjair09, Coordination of distributed agents is required for problems arising in many areas, including multi-robot systems, networking and e-commerce. As a formal framework for such problems, we use the decentralized partially observable Markov decision process (DEC-POMDP). Though much work has been done on optimal dynamic programming algorithms for the single-agent version of the problem, optimal algorithms for the multiagent case have been elusive. The main contribution of this paper is an optimal policy iteration algorithm for solving DEC-POMDPs. The algorithm uses stochastic finite-state controllers to represent policies. The solution can include a correlation device, which allows agents to correlate their actions without communicating. This approach alternates between expanding the controller and performing value-preserving transformations, which modify the controller without sacrificing value. We present two Efficient value-preserving transformations: one can reduce the size of the controller and the other can improve its value while keeping the size fixed. Empirical results demonstrate the usefulness of value-preserving transformations in increasing value while keeping controller size to a minimum. To broaden the applicability of the approach, we also present a heuristic version of the policy iteration algorithm, which sacrifices convergence to optimality. This algorithm further reduces the size of the controllers at each step by assuming that probability distributions over the other agents' actions are known. While this assumption may not hold in general, it helps produce higher quality solutions in our test problems. |
Petrik, Marek; Zilberstein, Shlomo A Bilinear Programming Approach for Multiagent Planning Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 35, pp. 235–274, 2009. @article{SZ:PZjair09, Multiagent planning and coordination problems are common and known to be computationally hard. We show that a wide range of two-agent problems can be formulated as bilinear programs. We present a successive approximation algorithm that significantly outperforms the coverage set algorithm, which is the state-of-the-art method for this class of multiagent problems. Because the algorithm is formulated for bilinear programs, it is more general and simpler to implement. The new algorithm can be terminated at any time and--unlike the coverage set algorithm--it facilitates the derivation of a useful online performance bound. It is also much more efficient, on average reducing the computation time of the optimal solution by about four orders of magnitude. Finally, we introduce an automatic dimensionality reduction method that improves the effectiveness of the algorithm, extending its applicability to new domains and providing a new way to analyze a subclass of bilinear programs. |
Becker, Raphen; Carlin, Alan; Lesser, Victor; Zilberstein, Shlomo Analyzing Myopic Approaches for Multi-Agent Communication Journal Article In: Computational Intelligence, vol. 25, no. 1, pp. 31–50, 2009. @article{SZ:BCLZci09, Choosing when to communicate is a fundamental problem in multi-agent systems. This problem becomes particularly challenging when communication is constrained and each agent has different partial information about the overall situation. We take a decision-theoretic approach to this problem that balances the benefits of communication against the costs. Although computing the exact value of communication is intractable, it can be estimated using a standard myopic assumption--that communication is only possible at the present time. We examine specific situations in which this assumption leads to poor performance and demonstrate an alternative approach that relaxes the assumption and improves performance. The results provide an effective method for value-driven communication policies in multi-agent systems. |
Amato, Christopher; Zilberstein, Shlomo Achieving Goals in Decentralized POMDPs Conference Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Budapest, Hungary, 2009. @conference{ASZ:Zaamas09, Coordination of multiple agents under uncertainty in the decentralized POMDP model is known to be NEXP-complete, even when the agents have a joint set of goals. Nevertheless, we show that the existence of goals can help develop effective planning algorithms. We examine an approach to model these problems as indefinite-horizon decentralized POMDPs, suitable for many practical problems that terminate after some unspecified number of steps. Our algorithm for solving these problems is optimal under some common assumptions--that terminal actions exist for each agent and rewards for non-terminal actions are negative. We also propose an infinite-horizon approximation method that allows us to relax these assumptions while maintaining goal conditions. An optimality bound is developed for this sample-based approach and experimental results show that it is able to exploit the goal structure effectively. Compared with the state-of-the-art, our approach can solve larger problems and produce significantly better solutions. |
Kumar, Akshat; Zilberstein, Shlomo Constraint-Based Dynamic Programming for Decentralized POMDPs with Structured Interactions Conference Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Budapest, Hungary, 2009. @conference{SZ:KZaamas09, Decentralized partially observable MDPs (DEC-POMDPs) provide a rich framework for modeling decision making by a team of agents. Despite rapid progress in this area, the limited scalability of solution techniques has restricted the applicability of the model. To overcome this computational barrier, research has focused on restricted classes of DEC-POMDPs, which are easier to solve yet rich enough to capture many practical problems. We present CBDP, an efficient and scalable point-based dynamic programming algorithm for one such model called ND-POMDP (Network Distributed POMDP). Specifically, CBDP provides magnitudes of speedup in the policy computation and generates better quality solution for all test instances. It has linear complexity in the number of agents and horizon length. Furthermore, the complexity per horizon for the examined class of problems is exponential only in a small parameter that depends upon the interaction among the agents, achieving significant scalability for large, loosely coupled multi-agent systems. The efficiency of CBDP lies in exploiting the structure of interactions using constraint networks. These results extend significantly the effectiveness of decision-theoretic planning in multi-agent settings. |
Kumar, Akshat; Zilberstein, Shlomo Dynamic Programming Approximations for Partially Observable Stochastic Games Conference Proceedings of the 22nd International FLAIRS Conference, Sanibel Island, Florida, 2009. @conference{SZ:KZflairs09, Partially observable stochastic games (POSGs) provide a rich mathematical framework for planning under uncertainty by a group of agents. However, this modeling advantage comes with a price, namely a high computational cost. Solving POSGs optimally quickly becomes intractable after a few decision cycles. Our main contribution is to provide bounded approximation techniques, which enable us to scale POSG algorithms by several orders of magnitude. We study both the POSG model and its cooperative counterpart, DEC-POMDP. Experiments on a number of problems confirm the scalability of our approach while still providing useful policies. |
Kumar, Akshat; Zilberstein, Shlomo Event-Detecting Multi-Agent MDPs: Complexity and Constant-Factor Approximation Conference Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI), Pasadena, California, 2009. @conference{SZ:KZijcai09, Planning under uncertainty for multiple agents has grown rapidly with the development of formal models such as multi-agent MDPs and decentralized MDPs. But despite their richness, the applicability of these models remains limited due to their computational complexity. We present the class of event-detecting multi-agent MDPs (eMMDPs), designed to detect multiple mobile targets by a team of sensor agents. We show that eMMDPs are NP-Hard and present a scalable 2-approximation algorithm for solving them using matroid theory and constraint optimization. The complexity of the algorithm is linear in the state-space and number of agents, quadratic in the horizon, and exponential only in a small parameter that depends on the interaction among the agents. Despite the worst-case approximation ratio of 2, experimental results show that the algorithm produces near-optimal policies for a range of test problems. |
Amato, Christopher; Dibangoye, Jilles Steeve; Zilberstein, Shlomo Incremental Policy Generation for Finite-Horizon DEC-POMDPs Conference Proceedings of the 19th International Conference on Automated Planning and Scheduling (ICAPS), Thessaloniki, Greece, 2009. @conference{SZ:ADZicaps09, Decentralized partially observable MDPs (DEC-POMDPs) provide a rich framework for modeling decision making by a team of agents. Despite rapid progress in this area, the limited scalability of solution techniques has restricted the applicability of the model. To overcome this computational barrier, research has focused on restricted classes of DEC-POMDPs, which are easier to solve yet rich enough to capture many practical problems. We present CBDP, an efficient and scalable point-based dynamic programming algorithm for one such model called ND-POMDP (Network Distributed POMDP). Specifically, CBDP provides magnitudes of speedup in the policy computation and generates better quality solution for all test instances. It has linear complexity in the number of agents and horizon length. Furthermore, the complexity per horizon for the examined class of problems is exponential only in a small parameter that depends upon the interaction among the agents, achieving significant scalability for large, loosely coupled multi-agent systems. The efficiency of CBDP lies in exploiting the structure of interactions using constraint networks. These results extend significantly the effectiveness of decision-theoretic planning in multi-agent settings. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Multi-Agent Online Planning with Communication Conference Proceedings of the 19th International Conference on Automated Planning and Scheduling (ICAPS), Thessaloniki, Greece, 2009. @conference{SZ:WZCicaps09, We propose an online algorithm for planning under uncertainty in multi-agent settings modeled as DEC-POMDPs. The algorithm helps overcome the high computational complexity of solving such problems off-line. The key challenge is to produce coordinated behavior using little or no communication. When communication is allowed but constrained, the challenge is to produce high value with minimal communication. The algorithm addresses these challenges by communicating only when history inconsistency is detected, allowing communication to be postponed if necessary. Moreover, it bounds the memory usage at each step and can be applied to problems with arbitrary horizons. The experimental results confirm that the algorithm can solve problems that are too large for the best existing off-line planning algorithms and it outperforms the best online method, producing higher value with much less communication in most cases. |
Allen, Martin; Zilberstein, Shlomo Complexity of Decentralized Control: Special Cases Conference Proceedings of the 23rd Neural Information Processing Systems Conference (NIPS), Vancouver, British Columbia, Canada, 2009. @conference{SZ:AZnips09, The worst-case complexity of general decentralized POMDPs, which are equivalent to partially observable stochastic games (POSGs) is very high, both for the cooperative and competitive cases. Some reductions in complexity have been achieved by exploiting independence relations in some models. We show that these results are somewhat limited: when these independence assumptions are relaxed in very small ways, complexity returns to that of the general case. |
Goldman, Claudia V; Zilberstein, Shlomo Communication-Based Decomposition Mechanisms for Decentralized MDPs Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 32, pp. 169–202, 2008. @article{SZ:GZjair08, Multi-agent planning in stochastic environments can be framed formally as a decentralized Markov decision problem. Many real-life distributed problems that arise in manufacturing, multi-robot coordination and information gathering scenarios can be formalized using this framework. However, finding the optimal solution in the general case is hard, limiting the applicability of recently developed algorithms. This paper provides a practical approach for solving decentralized control problems when communication among the decision makers is possible, but costly. We develop the notion of communication-based mechanism that allows us to decompose a decentralized MDP into multiple single-agent problems. In this framework, referred to as decentralized semi-Markov decision process with direct communication (Dec-SMDP-Com), agents operate separately between communications. We show that finding an optimal mechanism is equivalent to solving optimally a Dec-SMDP-Com. We also provide a heuristic search algorithm that converges on the optimal decomposition. Restricting the decomposition to some specific types of local behaviors reduces significantly the complexity of planning. In particular, we present a polynomial-time algorithm for the case in which individual agents perform goal-oriented behaviors between communications. The paper concludes with an additional tractable algorithm that enables the introduction of human knowledge, thereby reducing the overall problem to finding the best time to communicate. Empirical results show that these approaches provide good approximate solutions. |
Seuken, Sven; Zilberstein, Shlomo Formal Models and Algorithms for Decentralized Decision Making under Uncertainty Journal Article In: Autonomous Agents and Multi-Agent Systems (JAAMAS), vol. 17, no. 2, pp. 190–250, 2008. @article{SZ:SZjaamas08, Multi-agent planning in stochastic environments can be framed formally as a decentralized Markov decision problem. Many real-life distributed problems that arise in manufacturing, multi-robot coordination and information gathering scenarios can be formalized using this framework. However, finding the optimal solution in the general case is hard, limiting the applicability of recently developed algorithms. This paper provides a practical approach for solving decentralized control problems when communication among the decision makers is possible, but costly. We develop the notion of communication-based mechanism that allows us to decompose a decentralized MDP into multiple single-agent problems. In this framework, referred to as decentralized semi-Markov decision process with direct communication (Dec-SMDP-Com), agents operate separately between communications. We show that finding an optimal mechanism is equivalent to solving optimally a Dec-SMDP-Com. We also provide a heuristic search algorithm that converges on the optimal decomposition. Restricting the decomposition to some specific types of local behaviors reduces significantly the complexity of planning. In particular, we present a polynomial-time algorithm for the case in which individual agents perform goal-oriented behaviors between communications. The paper concludes with an additional tractable algorithm that enables the introduction of human knowledge, thereby reducing the overall problem to finding the best time to communicate. Empirical results show that these approaches provide good approximate solutions. |
Petrik, Marek; Zilberstein, Shlomo A Successive Approximation Algorithm for Coordination Problems Conference Proceedings of the 10th International Symposium on Artificial Intelligence and Mathematics (ISAIM), Ft. Lauderdale, Florida, 2008. @conference{SZ:PZisaim08, Developing scalable coordination algorithms for multi-agent systems is a hard computational challenge. One useful approach, demonstrated by the Coverage Set Algorithm (CSA), exploits structured interaction to produce significant computational gains. Empirically, CSA exhibits very good anytime performance, but an error bound on the results has not been established. We reformulate the algorithm and derive an online error bound for approximate solutions. Moreover, we propose an effective way to automatically reduce the complexity of the interaction. Our experiments show that this is a promising approach to solve a broad class of decentralized decision problems. The general formulation used by the algorithm makes it both easy to implement and widely applicable to a variety of other AI problems. |
Carlin, Alan; Zilberstein, Shlomo Value-Based Observation Compression for DEC-POMDPs Conference Proceedings of the 7th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Estoril, Portugal, 2008. @conference{SZ:CZaamas08, Representing agent policies compactly is essential for improving the scalability of multi-agent planning algorithms. In this paper, we focus on developing a pruning technique that allows us to merge certain observations within agent policies, while minimizing loss of value. This is particularly important for solving finite-horizon decentralized POMDPs, where agent policies are represented as trees, and where the size of policy trees grows exponentially with the number of observations. We introduce a value-based observation compression technique that prunes the least valuable observations while maintaining an error bound on the value lost as a result of pruning. We analyze the characteristics of this pruning strategy and show empirically that it is effective. Thus, we use compact policies to obtain significantly higher values compared with the best existing DEC-POMDP algorithm. |
Carlin, Alan; Zilberstein, Shlomo Observation Compression in DEC-POMDP Policy Trees Conference AAMAS Workshop on Multi-Agent Sequential Decision Making in Uncertain Domains (MSDM), Estoril, Portugal, 2008, (Best Paper Award). @conference{SZ:CZmsdm08, Representing agent policies compactly is essential for improving the scalability of multi-agent planning algorithms. In this paper, we focus on developing a pruning technique that allows us to merge certain observations from agent policies, while minimizing the loss of value. This is particularly important for solving finite-horizon decentralized POMDPs, where agent policies are represented as trees, and where the size of policy trees grows exponentially with the number of observations. We introduce a value-based observation compression technique that prunes the least valuable observations while maintaining an error bound on the value lost as a result of pruning. We analyze the characteristics of this pruning strategy and show empirically that it is effective. Thus, we use compact policies to obtain significantly higher values compared with the best existing DEC-POMDP algorithm. |
Amato, Christopher; Zilberstein, Shlomo What's Worth Memorizing: Attribute-based Planning for DEC-POMDPs Conference ICAPS Workshop on Multiagent Planning, Sydney, Australia, 2008. @conference{SZ:AZmasplan08, Current algorithms for decentralized partially observable Markov decision processes (DEC-POMDPs) require a large amount of memory to produce high quality plans. To combat this, existing methods optimize a set of finite-state controllers with an arbitrary amount of fixed memory. While this works well for some problems, in general, scalability and solution quality remain limited. As an alternative, we propose remembering some attributes that summarize key aspects of an agent's action and observation history. These attributes are often simple to determine, provide a well-motivated choice of controller size and focus the solution search on important components of agent histories. We show that for a range of DEC-POMDPs such attribute-based representation improves plan quality and scalability. |
Goldman, Claudia V; Allen, Martin; Zilberstein, Shlomo Learning to Communicate in a Decentralized Environment Journal Article In: Autonomous Agents and Multi-Agent Systems (JAAMAS), vol. 15, no. 1, pp. 47–90, 2007. @article{SZ:GAZjaamas07, Learning to communicate is an emerging challenge in AI research. It is known that agents interacting in decentralized, stochastic environments can benefit from exchanging information. Multi-agent planning generally assumes that agents share a common means of communication; however, in building robust distributed systems it is important to address potential miscoordination resulting from misinterpretation of messages exchanged. This paper lays foundations for studying this problem, examining its properties analytically and empirically in a decision-theoretic context. We establish a formal framework for the problem, and identify a collection of necessary and sufficient properties for decision problems that allow agents to employ probabilistic updating schemes in order to learn how to interpret what others are communicating. Solving the problem optimally is often intractable, but our approach enables agents using different languages to converge upon coordination over time. Our experimental work establishes how these methods perform when applied to problems of varying complexity. |
Seuken, Sven; Zilberstein, Shlomo Memory-Bounded Dynamic Programming for DEC-POMDPs Conference Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI), Hyderabad, India, 2007. @conference{SZ:SZijcai07, Decentralized decision making under uncertainty has been shown to be intractable when each agent has different partial information about the domain. Thus, improving the applicability and scalability of planning algorithms is an important challenge. We present the first memory-bounded dynamic programming algorithm for finite-horizon decentralized POMDPs. A set of heuristics is used to identify relevant points of the infinitely large belief space. Using these belief points, the algorithm successively selects the best joint policies for each horizon. The algorithm is extremely efficient, having linear time and space complexity with respect to the horizon length. Experimental results show that it can handle horizons that are multiple orders of magnitude larger than what was previously possible, while achieving the same or better solution quality. These results significantly increase the applicability of decentralized decision-making techniques. |
Amato, Christopher; Bernstein, Daniel S; Zilberstein, Shlomo Optimizing Memory-Bounded Controllers for Decentralized POMDPs Conference Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (UAI), Vancouver, British Columbia, 2007. @conference{SZ:ABZuai07, We present a memory-bounded optimization approach for solving infinite-horizon decentralized POMDPs. Policies for each agent are represented by stochastic finite state controllers. We formulate the problem of optimizing these policies as a nonlinear program, leveraging powerful existing nonlinear optimization techniques for solving the problem. While existing solvers only guarantee locally optimal solutions, we show that our formulation produces higher quality controllers than the state-of-the-art approach. We also incorporate a shared source of randomness in the form of a correlation device to further increase solution quality with only a limited increase in space and time. Our experimental results show that nonlinear optimization can be used to provide high quality, concise solutions to decentralized decision problems under uncertainty. |
Seuken, Sven; Zilberstein, Shlomo Improved Memory-Bounded Dynamic Programming for Decentralized POMDPs Conference Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (UAI), Vancouver, British Columbia, 2007. @conference{SZ:SZuai07, Memory-Bounded Dynamic Programming (MBDP) has proved extremely effective in solving decentralized POMDPs with large horizons. We generalize the algorithm and improve its scalability by reducing the complexity with respect to the number of observations from exponential to polynomial. We derive error bounds on solution quality with respect to this new approximation and analyze the convergence behavior. To evaluate the effectiveness of the improvements, we introduce a new, larger benchmark problem. Experimental results show that despite the high complexity of decentralized POMDPs, scalable solution techniques such as MBDP perform surprisingly well. |
Allen, Martin; Zilberstein, Shlomo Agent Influence as a Predictor of Difficulty for Decentralized Problem-Solving Conference Proceedings of the 22nd Conference on Artificial Intelligence (AAAI), Vancouver, British Columbia, 2007. @conference{SZ:AZaaai07, We study the effect of problem structure on the practical performance of optimal dynamic programming for decentralized decision problems. It is shown that restricting agent influence over problem dynamics can make the problem easier to solve. Experimental results establish that agent influence correlates with problem difficulty: as the gap between the influence of different agents grows, problems tend to become much easier to solve. The measure thus provides a general-purpose, automatic characterization of decentralized problems, identifying those for which optimal methods are more or less likely to work. Such a measure is also of possible use as a heuristic in the design of algorithms that create task decompositions and control hierarchies in order to simplify multiagent problems. |
Petrik, Marek; Zilberstein, Shlomo Anytime Coordination Using Separable Bilinear Programs Conference Proceedings of the 22nd Conference on Artificial Intelligence (AAAI), Vancouver, British Columbia, 2007. @conference{SZ:PZaaai07, Developing scalable coordination algorithms for multi-agent systems is a hard computational challenge. One useful approach, demonstrated by the Coverage Set Algorithm (CSA), exploits structured interaction to produce significant computational gains. Empirically, CSA exhibits very good anytime performance, but an error bound on the results has not been established. We reformulate the algorithm and derive both online and offline error bounds for approximate solutions. Moreover, we propose an effective way to automatically reduce the complexity of the interaction. Our experiments show that this is a promising approach to solve a broad class of decentralized decision problems. The general formulation used by the algorithm makes it both easy to implement and widely applicable to a variety of other AI problems. |
Szer, Daniel; Charpillet, Francois; Zilberstein, Shlomo MAA*: A Heuristic Search Algorithm for Solving Decentralized POMDPs Conference Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence (UAI), Edinburgh, Scotland, 2005. @conference{SZ:SCZuai05, We present multi-agent A* (MAA*), the first complete and optimal heuristic search algorithm for solving decentralized partially-observable Markov decision problems (DEC- POMDPs) with finite horizon. The algorithm is suitable for computing optimal plans for a cooperative group of agents that operate in a stochastic environment such as multi-robot coordination, network traffic control, or distributed resource allocation. Solving such problems effectively is a major challenge in the area of planning under uncertainty. Our solution is based on a synthesis of classical heuristic search and decentralized control theory. Experimental results show that MAA* has significant advantages. We introduce an anytime variant of MAA* and conclude with a discussion of promising extensions such as an approach to solving infinite-horizon problems. |
Allen, Martin; Goldman, Claudia V; Zilberstein, Shlomo Language Learning in Multi-Agent Systems Conference Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI), Edinburgh, Scotland, 2005. @conference{SZ:AGZijcai05, We present the problem of learning to communicate in decentralized and stochastic environments, analyzing it formally in a decision-theoretic context and illustrating the concept experimentally. Our approach allows agents to converge upon coordinated communication and action over time. |
Bernstein, Daniel S; Hansen, Eric A; Zilberstein, Shlomo Bounded Policy Iteration for Decentralized POMDPs Conference Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI), Edinburgh, Scotland, 2005. @conference{SZ:BHZijcai05, We present a bounded policy iteration algorithm for infinite-horizon decentralized POMDPs. Policies are represented as joint stochastic finite-state controllers, which consist of a local controller for each agent. We also let a joint controller include a correlation device that allows the agents to correlate their behavior without exchanging information during execution, and show that this leads to improved performance. The algorithm uses a fixed amount of memory, and each iteration is guaranteed to produce a controller with value at least as high as the previous one for all possible initial state distributions. For the case of a single agent, the algorithm reduces to Poupart and Boutilier's bounded policy iteration for POMDPs. |
Anytime Algorithms
Mahmud, Saaduddin; Nashed, Samer B.; Goldman, Claudia V.; Zilberstein, Shlomo Estimating Causal Responsibility for Explaining Autonomous Behavior Book Section In: Calvaresi, Davide (Ed.): International Workshop on Explainable and Transparent AI and Multi-Agent Systems (EXTRAAMAS), pp. 78–94, Springer, 2023. @incollection{SZ:MNGZextraamas23, There has been growing interest in causal explanations of stochastic, sequential decision-making systems. Structural causal models and causal reasoning offer several theoretical benefits when exact inference can be applied. Furthermore, users overwhelmingly prefer the resulting causal explanations over other state-of-the-art systems. In this work, we focus on one such method, MeanRESP, and its approximate versions that drastically reduce compute load and assign a responsibility score to each variable, which helps identify smaller sets of causes to be used as explanations. However, this method, and its approximate versions in particular, lack deeper theoretical analysis and broader empirical tests. To address these shortcomings, we provide three primary contributions. First, we offer several theoretical insights on the sample complexity and error rate of approximate MeanRESP. Second, we discuss several automated metrics for comparing explanations generated from approximate methods to those generated via exact methods. While we recognize the significance of user studies as the gold standard for evaluating explanations, our aim is to leverage the proposed metrics to systematically compare explanation-generation methods along important quantitative dimensions. Finally, we provide a more detailed discussion of MeanRESP and how its output under different definitions of responsibility compares to existing widely adopted methods that use Shapley values. |
Parr, Shane; Khatri, Ishan; Svegliato, Justin; Zilberstein, Shlomo Agent-Aware State Estimation in Autonomous Vehicles Conference Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 2021. @conference{SZ:PKSZiros21, Autonomous systems often operate in environments where the behavior of multiple agents is coordinated by a shared global state. Reliable estimation of the global state is thus critical for successfully operating in a multi-agent setting. We introduce agent-aware state estimation--a framework for calculating indirect estimations of state given observations of the behavior of other agents in the environment. We also introduce transition-independent agent-aware state estimation--a tractable class of agent-aware state estimation--and show that it allows the speed of inference to scale linearly with the number of agents in the environment. As an example, we model traffic light classification in instances of complete loss of direct observation. By taking into account observations of vehicular behavior from multiple directions of traffic, our approach exhibits accuracy higher than that of existing traffic light-only HMM methods on a real-world autonomous vehicle data set under a variety of simulated occlusion scenarios. |
Wu, Feng; Zilberstein, Shlomo; Jennings, Nicholas R Multi-Agent Planning with High-Level Human Guidance Conference Proceedings of Principles and Practice of Multi-Agent Systems (PRIMA), 2020. @conference{SZ:WZJprima20, Planning and coordination of multiple agents in the presence of uncertainty and noisy sensors is extremely hard. A human operator who observes a multi-agent team can provide valuable guidance to the team based on her superior ability to interpret observations and assess the overall situation. We propose an extension of decentralized POMDPs that allows such human guidance to be factored into the planning and execution processes. Human guidance in our framework consists of intuitive high-level commands that the agents must translate into a suitable joint plan that is sensitive to what they know from local observations. The result is a framework that allows multi-agent systems to benefit from the complex strategic thinking of a human supervising them. We evaluate this approach on several common benchmark problems and show that it can lead to dramatic improvement in performance. |
Wu, Feng; Zilberstein, Shlomo; Jennings, Nicholas R Stochastic Multi-agent Planning with Partial State Models Conference Proceedings of the First International Conference on Distributed Artificial Intelligence (DAI), Beijing, China, 2019. @conference{SZ:WZJdai19, People who observe a multi-agent team can often provide valuable information to the agents based on their superior cognitive abilities to interpret sequences of observations and assess the overall situation. The knowledge they possess is often difficult to be fully represented using a formal model such as DEC-POMDP. To deal with this, we propose an extension of the DEC-POMDP that allows states to be partially specified and benefit from expert knowledge, while preserving the partial observability and decentralized operation of the agents. In particular, we present an algorithm for computing policies based on history samples that include human labeled data in the form of reward reshaping. We also consider ways to minimize the burden on human experts during the labeling phase. The results offer the first approach to incorporating human knowledge in such complex multi-agent settings. We demonstrate the benefits of our approach using a disaster recovery scenario, comparing it to several baseline approaches. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Privacy-Preserving Policy Iteration for Decentralized POMDPs Conference Proceedings of the 32nd Conference on Artificial Intelligence (AAAI), New Orleans, Louisiana, 2018. @conference{SZ:WZCaaai18, We propose the first privacy-preserving approach to address the privacy issues that arise in multi-agent planning problems modeled as a Dec-POMDP. Our solution is a distributed message-passing algorithm based on trials, where the agents' policies are optimized using the cross-entropy method. In our algorithm, the agents' private information is protected using a public-key homomorphic cryptosystem. We prove the correctness of our algorithm and analyze its complexity in terms of message passing and encryption/decryption operations. Furthermore, we analyze several privacy aspects of our algorithm and show that it can preserve the agent privacy of non-neighbors, model privacy, and decision privacy. Our experimental results on several common Dec-POMDP bench- mark problems confirm the effectiveness of our approach. |
Wray, Kyle Hollins; Kumar, Akshat; Zilberstein, Shlomo Integrated Cooperation and Competition in Multi-Agent Decision-Making Conference Proceedings of the 32nd Conference on Artificial Intelligence (AAAI), New Orleans, Louisiana, 2018. @conference{SZ:WKZaaai18, Observing that many real-world sequential decision problems are not purely cooperative or purely competitive, we propose a new model--cooperative-competitive process (CCP)--that can simultaneously encapsulate both cooperation and competition. First, we discuss how the CCP model bridges the gap between cooperative and competitive models. Next, we investigate a specific class of group-dominant CCPs, in which agents cooperate to achieve a common goal as their primary objective, while also pursuing individual goals as a secondary objective. We provide an approximate solution for this class of problems that leverages stochastic finite-state controllers. The model is grounded in two multi-robot meeting and box-pushing domains that are implemented in simulation and demonstrated on two real robots. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Multi-Agent Planning with Baseline Regret Minimization Conference Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI), 2017. @conference{SZ:WZCijcai17, We propose a novel baseline regret minimization algorithm for multi-agent planning problems modeled as finite-horizon decentralized POMDPs. It guarantees to produce a policy that is provably at least as good as a given baseline policy. We also propose an iterative belief generation algorithm to efficiently minimize the baseline regret, which only requires necessary iterations so as to converge to the policy with minimum baseline regret. Experimental results on common benchmark problems confirm the benefits of the algorithm compared with the state-of-the-art approaches. |
Kumar, Akshat; Mostafa, Hala; Zilberstein, Shlomo Dual Formulations for Optimizing Dec-POMDP Controllers Conference Proceedings of the 26th International Conference on Automated Planning and Scheduling (ICAPS), London, UK, 2016. @conference{SZ:KMZicaps16, Decentralized POMDP is an expressive model for multiagent planning. Finite-state controllers (FSCs)--often used to represent policies for infinite-horizon problems---offer a compact, simple-to-execute policy representation. We exploit novel connections between optimizing decentralized FSCs and the dual linear program for MDPs. Consequently, we describe a dual mixed integer linear program (MIP) for optimizing deterministic FSCs. We exploit the Dec-POMDP structure to devise a compact MIP and formulate constraints that result in policies executable in partially-observable decentralized settings. We show analytically that the dual formulation can also be exploited within the expectation maximization (EM) framework to optimize stochastic FSCs. The resulting EM algorithm can be implemented by solving a sequence of linear programs, without requiring expensive message passing over the Dec-POMDP DBN. We also present an efficient technique for policy improvement based on a weighted entropy measure. Compared with state-of-the-art FSC methods, our approach offers over an order-of-magnitude speedup, while producing similar or better solutions. |
Kumar, Akshat; Zilberstein, Shlomo; Toussaint, Marc Probabilistic Inference Techniques for Scalable Multiagent Decision Making Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 53, pp. 223–270, 2015. @article{SZ:KZTjair15, Decentralized POMDPs provide an expressive framework for multiagent sequential decision making. However, the complexity of these models -- NEXP-Complete even for two agents -- has limited their scalability. We present a promising new class of approximation algorithms by developing novel connections between multiagent planning and machine learning. We show how the multiagent planning problem can be reformulated as inference in a mixture of dynamic Bayesian networks (DBNs). This planning-as-inference approach paves the way for the application of efficient inference techniques in DBNs to multiagent decision making. To further improve scalability, we identify certain conditions that are sufficient to extend the approach to multiagent systems with dozens of agents. Specifically, we show that the necessary inference within the expectation-maximization framework can be decomposed into processes that often involve a small subset of agents, thereby facilitating scalability. We further show that a number of existing multiagent planning models satisfy these conditions. Experiments on large planning benchmarks confirm the benefits of our approach in terms of runtime and scalability with respect to existing techniques. |
Nguyen, Duc Thien; Yeoh, William; Lau, Hoong Chuin; Zilberstein, Shlomo; Zhang, Chongjie Decentralized Multi-Agent Reinforcement Learning in Average-Reward Dynamic DCOPs Conference Proceedings of the 28th Conference on Artificial Intelligence (AAAI), Quebec City, Canada, 2014. @conference{SZ:NYLZZaaai14, Researchers have introduced the Dynamic Distributed Constraint Optimization Problem (Dynamic DCOP) formulation to model dynamically changing multi-agent coordination problems, where a dynamic DCOP is a sequence of (static canonical) DCOPs, each partially different from the DCOP preceding it. Existing work typically assumes that the problem in each time step is decoupled from the problems in other time steps, which might not hold in some applications. Therefore, in this paper, we make the following contributions: (i) We introduce a new model, called Markovian Dynamic DCOPs (MD-DCOPs), where the DCOP in the next time step is a function of the value assignments in the current time step; (ii) We introduce two distributed reinforcement learning algorithms, the Distributed RVI Q-learning algorithm and the Distributed R-learning algorithm, that balance exploration and exploitation to solve MD-DCOPs in an online manner; and (iii) We empirically evaluate them against an existing multiarm bandit DCOP algorithm on dynamic DCOPs. |
Brafman, Ronen I; Shani, Guy; Zilberstein, Shlomo Qualitative Planning under Partial Observability in Multi-Agent Domains Conference Proceedings of the 27th Conference on Artificial Intelligence (AAAI), Bellevue, Washington, 2013. @conference{SZ:BSZaaai13, Decentralized POMDPs (Dec-POMDPs) provide a rich, attractive model for planning under uncertainty and partial observability in cooperative multi-agent domains with a growing body of research. In this paper we formulate a qualitative, propositional model for multi-agent planning under uncertainty with partial observability, which we call Qualitative Dec-POMDP (QDec-POMDP). We show that the worst-case complexity of planning in QDec-POMDPs is similar to that of Dec-POMDPs. Still, because the model is more "classical" in nature, it is more compact and easier to specify. Furthermore, it eases the adaptation of methods used in classical and contingent planning to solve problems that challenge current Dec-POMDPs solvers. In particular, in this paper we describe a method based on compilation to classical planning, which handles multi-agent planning problems significantly larger than those handled by current Dec-POMDP algorithms. |
Wu, Feng; Zilberstein, Shlomo; Jennings, Nicholas R Monte-Carlo Expectation Maximization for Decentralized POMDPs Conference Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI), Beijing, China, 2013. @conference{SZ:WZJijcai13, We address two significant drawbacks of state-of-the-art solvers of decentralized POMDPs (DEC-POMDPs): the reliance on complete knowledge of the model and limited scalability as the complexity of the domain grows. We extend a recently proposed approach for solving DEC-POMDPs via a reduction to the maximum likelihood problem, which in turn can be solved using EM. We introduce a model-free version of this approach that employs Monte-Carlo EM (MCEM). While a naive implementation of MCEM is inadequate in multi-agent settings, we introduce several improvements in sampling that produce high-quality results on a variety of DEC-POMDP benchmarks, including large problems with thousands of agents. |
Yeoh, William; Kumar, Akshat; Zilberstein, Shlomo Automated Generation of Interaction Graphs for Value-Factored Dec-POMDPs Conference Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI), Beijing, China, 2013. @conference{SZ:YKZijcai13, The Decentralized Partially Observable Markov Decision Process (Dec-POMDP) is a powerful model for multiagent planning under uncertainty, but its applicability is hindered by its high complexity -- solving Dec-POMDPs optimally is NEXP-hard. Recently, Kumar et al. introduced the Value Factorization (VF) framework, which exploits decomposable value functions that can be factored into subfunctions. This framework has been shown to be a generalization of several models that leverage sparse agent interactions such as TI-Dec-MDPs, ND-POMDPs and TD-POMDPs. Existing algorithms for these models assume that the interaction graph of the problem is given. In this paper, we introduce three algorithms to automatically generate interaction graphs for models within the VF framework and establish lower and upper bounds on the expected reward of an optimal joint policy. We illustrate experimentally the benefits of these techniques for sensor placement in a decentralized tracking application. |
Durfee, Edmund; Zilberstein, Shlomo Multiagent Planning, Control, and Execution Book Section In: Weiss, G (Ed.): Multiagent Systems, Second Edition, pp. 485–546, MIT Press, Cambridge, MA, USA, 2013. @incollection{SZ:DZmultiagent13, |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Online Planning for Multi-Agent Systems with Bounded Communication Journal Article In: Artificial Intelligence (AIJ), vol. 175, no. 2, pp. 487–511, 2011. @article{SZ:WZCaij11, We propose an online algorithm for planning under uncertainty in multi-agent settings modeled as DEC-POMDPs. The algorithm helps overcome the high computational complexity of solving such problems offline. The key challenges in decentralized operation are to maintain coordinated behavior with little or no communication and, when communication is allowed, to optimize value with minimal communication. The algorithm addresses these challenges by generating identical conditional plans based on common knowledge and communicating only when history inconsistency is detected, allowing communication to be postponed when necessary. To be suitable for online operation, the algorithm computes good local policies using a new and fast local search method implemented using linear programming. Moreover, it bounds the amount of memory used at each step and can be applied to problems with arbitrary horizons. The experimental results confirm that the algorithm can solve problems that are too large for the best existing offline planning algorithms and it outperforms the best online method, producing much higher value with much less communication in most cases. The algorithm also proves to be effective when the communication channel is imperfect (periodically unavailable). These results contribute to the scalability of decision-theoretic planning in multi-agent settings. |
Kumar, Akshat; Zilberstein, Shlomo Message-Passing Algorithms for Large Structured Decentralized POMDPs Conference Proceedings of the 10th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Taipei, Taiwan, 2011. @conference{SZ:KZaamas11, Anytime algorithms allow a system to trade solution quality for computation time. In previous work, monitoring techniques have been developed to allow agents to stop the computation at the "right" time so as to optimize a given time-dependent utility function. However, these results apply only to the single-agent case. In this paper we analyze the problems that arise when several agents solve components of a larger problem, each using an anytime algorithm. Monitoring in this case is more challenging as each agent is uncertain about the progress made so far by the others. We develop a formal framework for decentralized monitoring, establish the complexity of several interesting variants of the problem, and propose solution techniques for each one. Finally, we show that the framework can be applied to decentralized flow and planning problems. |
Kumar, Akshat; Zilberstein, Shlomo; Toussaint, Marc Scalable Multiagent Planning Using Probabilistic Inference Conference Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI), Barcelona, Spain, 2011. @conference{SZ:KZTijcai11, Multiagent planning has seen much progress with the development of formal models such as Dec-POMDPs. However, the complexity of these models -- NEXP-Complete even for two agents -- has limited scalability. We identify certain mild conditions that are sufficient to make multiagent planning amenable to a scalable approximation w.r.t. the number of agents. This is achieved by constructing a graphical model in which likelihood maximization is equivalent to plan optimization. Using the Expectation-Maximization framework for likelihood maximization, we show that the necessary inference can be decomposed into processes that often involve a small subset of agents, thereby facilitating scalability. We derive a global update rule that combines these local inferences to monotonically increase the overall solution quality. Experiments on a large multiagent planning benchmark confirm the benefits of the new approach in terms of runtime and scalability. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Online Planning for Ad Hoc Autonomous Agent Teams Conference Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI), Barcelona, Spain, 2011. @conference{SZ:WZCijcai11, We propose a novel online planning algorithm for ad hoc team settings -- challenging situations in which an agent must collaborate with unknown teammates without prior coordination. Our approach is based on constructing and solving a series of stage games, and then using biased adaptive play to choose actions. The utility function in each stage game is estimated via Monte-Carlo tree search using the UCT algorithm. We establish analytically the convergence of the algorithm and show that it performs well in a variety of ad hoc team domains. |
Amato, Christopher; Bernstein, Daniel S; Zilberstein, Shlomo Optimizing Fixed-Size Stochastic Controllers for POMDPs and Decentralized POMDPs Journal Article In: Autonomous Agents and Multi-Agent Systems (JAAMAS), vol. 21, no. 3, pp. 293–320, 2010. @article{SZ:ABZjaamas10, Coordination of distributed agents is required for problems arising in many areas, including multi-robot systems, networking and e-commerce. As a formal framework for such problems, we use the decentralized partially observable Markov decision process (DEC-POMDP). Though much work has been done on optimal dynamic programming algorithms for the single-agent version of the problem, optimal algorithms for the multiagent case have been elusive. The main contribution of this paper is an optimal policy iteration algorithm for solving DEC-POMDPs. The algorithm uses stochastic finite-state controllers to represent policies. The solution can include a correlation device, which allows agents to correlate their actions without communicating. This approach alternates between expanding the controller and performing value-preserving transformations, which modify the controller without sacrificing value. We present two Efficient value-preserving transformations: one can reduce the size of the controller and the other can improve its value while keeping the size fixed. Empirical results demonstrate the usefulness of value-preserving transformations in increasing value while keeping controller size to a minimum. To broaden the applicability of the approach, we also present a heuristic version of the policy iteration algorithm, which sacrifices convergence to optimality. This algorithm further reduces the size of the controllers at each step by assuming that probability distributions over the other agents' actions are known. While this assumption may not hold in general, it helps produce higher quality solutions in our test problems. |
Kumar, Akshat; Zilberstein, Shlomo Point-Based Backup for Decentralized POMDPs: Complexity and New Algorithms Conference Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Toronto, Canada, 2010. @conference{SZ:KZaamas10, Decentralized POMDPs provide an expressive framework for sequential multi-agent decision making. Despite their high complexity, there has been significant progress in scaling up existing algorithms, largely due to the use of point-based methods. Performing point-based backup is a fundamental operation in state-of-the-art algorithms. We show that even a single backup step in the multi-agent setting is NP-Complete. Despite this negative worst-case result, we present an efficient and scalable optimal algorithm as well as a principled approximation scheme. The optimal algorithm exploits recent advances in the weighted CSP literature to overcome the complexity of the backup operation. The polytime approximation scheme provides a constant factor approximation guarantee based on the number of belief points. In experiments on standard domains, the optimal approach provides significant speedup (up to 2 orders of magnitude) over the previous best optimal algorithm and is able to increase the number of belief points by more than a factor of 3. The approximation scheme also works well in practice, providing near-optimal solutions to the backup problem. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Point-Based Policy Generation for Decentralized POMDPs Conference Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Toronto, Canada, 2010. @conference{SZ:WZCaamas10, Memory-bounded techniques have shown great promise in solving complex multi-agent planning problems modeled as DEC-POMDPs. Much of the performance gains can be attributed to pruning techniques that alleviate the complexity of the exhaustive backup step of the original MBDP algorithm. Despite these improvements, state-of-the-art algorithms can still handle a relative small pool of candidate policies, which limits the quality of the solution in some benchmark problems. We present a new algorithm, Point-Based Policy Generation, which avoids altogether searching the entire joint policy space. The key observation is that the best joint policy for each reachable belief state can be constructed directly, instead of producing first a large set of candidates. We also provide an efficient approximate implementation of this operation. The experimental results show that our solution technique improves the performance significantly in terms of both runtime and solution quality. |
Kumar, Akshat; Zilberstein, Shlomo Anytime Planning for Decentralized POMDPs using Expectation Maximization Conference Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI), Catalina Island, California, 2010. @conference{SZ:KZuai10, Decentralized POMDPs provide an expressive framework for multi-agent sequential decision making. While finite-horizon DEC-POMDPs have enjoyed significant success, progress remains slow for the infinite-horizon case mainly due to the inherent complexity of optimizing stochastic controllers representing agent policies. We present a promising new class of algorithms for the infinite-horizon case, which recasts the optimization problem as inference in a mixture of DBNs. An attractive feature of this approach is the straightforward adoption of existing inference techniques in DBNs for solving DEC-POMDPs and supporting richer representations such as factored or continuous states and actions. We also derive the Expectation Maximization (EM) algorithm to optimize the joint policy represented as DBNs. Experiments on benchmark domains show that EM compares favorably against the state-of-the-art solvers. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Rollout Sampling Policy Iteration for Decentralized POMDPs Conference Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI), Catalina Island, California, 2010. @conference{SZ:WZCuai10, We present decentralized rollout sampling policy iteration (DecRSPI)--a new algorithm for multiagent decision problems formalized as DEC-POMDPs. DecRSPI is designed to improve scalability and tackle problems that lack an explicit model. The algorithm uses Monte-Carlo methods to generate a sample of reachable belief states. Then it computes a joint policy for each belief state based on the rollout estimations. A new policy representation allows us to represent solutions compactly. The key benefits of the algorithm are its linear time complexity over the number of agents, its bounded memory usage and good solution quality. It can solve larger problems that are intractable for existing planning algorithms. Experimental results confirm the effectiveness and scalability of the approach. |
Amato, Christopher; Bonet, Blai; Zilberstein, Shlomo Finite-State Controllers Based on Mealy Machines for Centralized and Decentralized POMDPs Conference Proceedings of the 24th Conference on Artificial Intelligence (AAAI), Atlanta, Georgia, 2010. @conference{SZ:ABZaaai10, Existing controller-based approaches for centralized and decentralized POMDPs are based on automata with output known as Moore machines. In this paper, we show that several advantages can be gained by utilizing another type of automata, the Mealy machine. Mealy machines are more powerful than Moore machines, provide a richer structure that can be exploited by solution methods, and can be easily incorporated into current controller-based approaches. To demonstrate this, we adapted some existing controller-based algorithms to use Mealy machines and obtained results on a set of benchmark domains. The Mealy-based approach always outperformed the Moore-based approach and often outperformed the state-of-the-art algorithms for both centralized and decentralized POMDPs. These findings provide fresh and general insights for the improvement of existing algorithms and the development of new ones. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Trial-Based Dynamic Programming for Multi-Agent Planning Conference Proceedings of the 24th Conference on Artificial Intelligence (AAAI), Atlanta, Georgia, 2010. @conference{SZ:WZCaaai10, Trial-based approaches offer an efficient way to solve single-agent MDPs and POMDPs. These approaches allow agents to focus their computations on regions of the environment they encounter during the trials, leading to significant computational savings. We present a novel trial-based dynamic programming (TBDP) algorithm for DEC-POMDPs that extends these benefits to multi-agent settings. The algorithm uses trial-based methods for both belief generation and policy evaluation. Policy improvement is implemented efficiently using linear programming and a sub-policy reuse technique that helps bound the amount of memory. The results show that TBDP can produce significant value improvements and is much faster than the best existing planning algorithms. |
Bernstein, Daniel S; Amato, Christopher; Hansen, Eric A; Zilberstein, Shlomo Policy Iteration for Decentralized Control of Markov Decision Processes Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 34, pp. 89–132, 2009. @article{SZ:BAHZjair09, Coordination of distributed agents is required for problems arising in many areas, including multi-robot systems, networking and e-commerce. As a formal framework for such problems, we use the decentralized partially observable Markov decision process (DEC-POMDP). Though much work has been done on optimal dynamic programming algorithms for the single-agent version of the problem, optimal algorithms for the multiagent case have been elusive. The main contribution of this paper is an optimal policy iteration algorithm for solving DEC-POMDPs. The algorithm uses stochastic finite-state controllers to represent policies. The solution can include a correlation device, which allows agents to correlate their actions without communicating. This approach alternates between expanding the controller and performing value-preserving transformations, which modify the controller without sacrificing value. We present two Efficient value-preserving transformations: one can reduce the size of the controller and the other can improve its value while keeping the size fixed. Empirical results demonstrate the usefulness of value-preserving transformations in increasing value while keeping controller size to a minimum. To broaden the applicability of the approach, we also present a heuristic version of the policy iteration algorithm, which sacrifices convergence to optimality. This algorithm further reduces the size of the controllers at each step by assuming that probability distributions over the other agents' actions are known. While this assumption may not hold in general, it helps produce higher quality solutions in our test problems. |
Petrik, Marek; Zilberstein, Shlomo A Bilinear Programming Approach for Multiagent Planning Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 35, pp. 235–274, 2009. @article{SZ:PZjair09, Multiagent planning and coordination problems are common and known to be computationally hard. We show that a wide range of two-agent problems can be formulated as bilinear programs. We present a successive approximation algorithm that significantly outperforms the coverage set algorithm, which is the state-of-the-art method for this class of multiagent problems. Because the algorithm is formulated for bilinear programs, it is more general and simpler to implement. The new algorithm can be terminated at any time and--unlike the coverage set algorithm--it facilitates the derivation of a useful online performance bound. It is also much more efficient, on average reducing the computation time of the optimal solution by about four orders of magnitude. Finally, we introduce an automatic dimensionality reduction method that improves the effectiveness of the algorithm, extending its applicability to new domains and providing a new way to analyze a subclass of bilinear programs. |
Becker, Raphen; Carlin, Alan; Lesser, Victor; Zilberstein, Shlomo Analyzing Myopic Approaches for Multi-Agent Communication Journal Article In: Computational Intelligence, vol. 25, no. 1, pp. 31–50, 2009. @article{SZ:BCLZci09, Choosing when to communicate is a fundamental problem in multi-agent systems. This problem becomes particularly challenging when communication is constrained and each agent has different partial information about the overall situation. We take a decision-theoretic approach to this problem that balances the benefits of communication against the costs. Although computing the exact value of communication is intractable, it can be estimated using a standard myopic assumption--that communication is only possible at the present time. We examine specific situations in which this assumption leads to poor performance and demonstrate an alternative approach that relaxes the assumption and improves performance. The results provide an effective method for value-driven communication policies in multi-agent systems. |
Amato, Christopher; Zilberstein, Shlomo Achieving Goals in Decentralized POMDPs Conference Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Budapest, Hungary, 2009. @conference{ASZ:Zaamas09, Coordination of multiple agents under uncertainty in the decentralized POMDP model is known to be NEXP-complete, even when the agents have a joint set of goals. Nevertheless, we show that the existence of goals can help develop effective planning algorithms. We examine an approach to model these problems as indefinite-horizon decentralized POMDPs, suitable for many practical problems that terminate after some unspecified number of steps. Our algorithm for solving these problems is optimal under some common assumptions--that terminal actions exist for each agent and rewards for non-terminal actions are negative. We also propose an infinite-horizon approximation method that allows us to relax these assumptions while maintaining goal conditions. An optimality bound is developed for this sample-based approach and experimental results show that it is able to exploit the goal structure effectively. Compared with the state-of-the-art, our approach can solve larger problems and produce significantly better solutions. |
Kumar, Akshat; Zilberstein, Shlomo Constraint-Based Dynamic Programming for Decentralized POMDPs with Structured Interactions Conference Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Budapest, Hungary, 2009. @conference{SZ:KZaamas09, Decentralized partially observable MDPs (DEC-POMDPs) provide a rich framework for modeling decision making by a team of agents. Despite rapid progress in this area, the limited scalability of solution techniques has restricted the applicability of the model. To overcome this computational barrier, research has focused on restricted classes of DEC-POMDPs, which are easier to solve yet rich enough to capture many practical problems. We present CBDP, an efficient and scalable point-based dynamic programming algorithm for one such model called ND-POMDP (Network Distributed POMDP). Specifically, CBDP provides magnitudes of speedup in the policy computation and generates better quality solution for all test instances. It has linear complexity in the number of agents and horizon length. Furthermore, the complexity per horizon for the examined class of problems is exponential only in a small parameter that depends upon the interaction among the agents, achieving significant scalability for large, loosely coupled multi-agent systems. The efficiency of CBDP lies in exploiting the structure of interactions using constraint networks. These results extend significantly the effectiveness of decision-theoretic planning in multi-agent settings. |
Kumar, Akshat; Zilberstein, Shlomo Dynamic Programming Approximations for Partially Observable Stochastic Games Conference Proceedings of the 22nd International FLAIRS Conference, Sanibel Island, Florida, 2009. @conference{SZ:KZflairs09, Partially observable stochastic games (POSGs) provide a rich mathematical framework for planning under uncertainty by a group of agents. However, this modeling advantage comes with a price, namely a high computational cost. Solving POSGs optimally quickly becomes intractable after a few decision cycles. Our main contribution is to provide bounded approximation techniques, which enable us to scale POSG algorithms by several orders of magnitude. We study both the POSG model and its cooperative counterpart, DEC-POMDP. Experiments on a number of problems confirm the scalability of our approach while still providing useful policies. |
Kumar, Akshat; Zilberstein, Shlomo Event-Detecting Multi-Agent MDPs: Complexity and Constant-Factor Approximation Conference Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI), Pasadena, California, 2009. @conference{SZ:KZijcai09, Planning under uncertainty for multiple agents has grown rapidly with the development of formal models such as multi-agent MDPs and decentralized MDPs. But despite their richness, the applicability of these models remains limited due to their computational complexity. We present the class of event-detecting multi-agent MDPs (eMMDPs), designed to detect multiple mobile targets by a team of sensor agents. We show that eMMDPs are NP-Hard and present a scalable 2-approximation algorithm for solving them using matroid theory and constraint optimization. The complexity of the algorithm is linear in the state-space and number of agents, quadratic in the horizon, and exponential only in a small parameter that depends on the interaction among the agents. Despite the worst-case approximation ratio of 2, experimental results show that the algorithm produces near-optimal policies for a range of test problems. |
Amato, Christopher; Dibangoye, Jilles Steeve; Zilberstein, Shlomo Incremental Policy Generation for Finite-Horizon DEC-POMDPs Conference Proceedings of the 19th International Conference on Automated Planning and Scheduling (ICAPS), Thessaloniki, Greece, 2009. @conference{SZ:ADZicaps09, Decentralized partially observable MDPs (DEC-POMDPs) provide a rich framework for modeling decision making by a team of agents. Despite rapid progress in this area, the limited scalability of solution techniques has restricted the applicability of the model. To overcome this computational barrier, research has focused on restricted classes of DEC-POMDPs, which are easier to solve yet rich enough to capture many practical problems. We present CBDP, an efficient and scalable point-based dynamic programming algorithm for one such model called ND-POMDP (Network Distributed POMDP). Specifically, CBDP provides magnitudes of speedup in the policy computation and generates better quality solution for all test instances. It has linear complexity in the number of agents and horizon length. Furthermore, the complexity per horizon for the examined class of problems is exponential only in a small parameter that depends upon the interaction among the agents, achieving significant scalability for large, loosely coupled multi-agent systems. The efficiency of CBDP lies in exploiting the structure of interactions using constraint networks. These results extend significantly the effectiveness of decision-theoretic planning in multi-agent settings. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Multi-Agent Online Planning with Communication Conference Proceedings of the 19th International Conference on Automated Planning and Scheduling (ICAPS), Thessaloniki, Greece, 2009. @conference{SZ:WZCicaps09, We propose an online algorithm for planning under uncertainty in multi-agent settings modeled as DEC-POMDPs. The algorithm helps overcome the high computational complexity of solving such problems off-line. The key challenge is to produce coordinated behavior using little or no communication. When communication is allowed but constrained, the challenge is to produce high value with minimal communication. The algorithm addresses these challenges by communicating only when history inconsistency is detected, allowing communication to be postponed if necessary. Moreover, it bounds the memory usage at each step and can be applied to problems with arbitrary horizons. The experimental results confirm that the algorithm can solve problems that are too large for the best existing off-line planning algorithms and it outperforms the best online method, producing higher value with much less communication in most cases. |
Allen, Martin; Zilberstein, Shlomo Complexity of Decentralized Control: Special Cases Conference Proceedings of the 23rd Neural Information Processing Systems Conference (NIPS), Vancouver, British Columbia, Canada, 2009. @conference{SZ:AZnips09, The worst-case complexity of general decentralized POMDPs, which are equivalent to partially observable stochastic games (POSGs) is very high, both for the cooperative and competitive cases. Some reductions in complexity have been achieved by exploiting independence relations in some models. We show that these results are somewhat limited: when these independence assumptions are relaxed in very small ways, complexity returns to that of the general case. |
Goldman, Claudia V; Zilberstein, Shlomo Communication-Based Decomposition Mechanisms for Decentralized MDPs Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 32, pp. 169–202, 2008. @article{SZ:GZjair08, Multi-agent planning in stochastic environments can be framed formally as a decentralized Markov decision problem. Many real-life distributed problems that arise in manufacturing, multi-robot coordination and information gathering scenarios can be formalized using this framework. However, finding the optimal solution in the general case is hard, limiting the applicability of recently developed algorithms. This paper provides a practical approach for solving decentralized control problems when communication among the decision makers is possible, but costly. We develop the notion of communication-based mechanism that allows us to decompose a decentralized MDP into multiple single-agent problems. In this framework, referred to as decentralized semi-Markov decision process with direct communication (Dec-SMDP-Com), agents operate separately between communications. We show that finding an optimal mechanism is equivalent to solving optimally a Dec-SMDP-Com. We also provide a heuristic search algorithm that converges on the optimal decomposition. Restricting the decomposition to some specific types of local behaviors reduces significantly the complexity of planning. In particular, we present a polynomial-time algorithm for the case in which individual agents perform goal-oriented behaviors between communications. The paper concludes with an additional tractable algorithm that enables the introduction of human knowledge, thereby reducing the overall problem to finding the best time to communicate. Empirical results show that these approaches provide good approximate solutions. |
Seuken, Sven; Zilberstein, Shlomo Formal Models and Algorithms for Decentralized Decision Making under Uncertainty Journal Article In: Autonomous Agents and Multi-Agent Systems (JAAMAS), vol. 17, no. 2, pp. 190–250, 2008. @article{SZ:SZjaamas08, Multi-agent planning in stochastic environments can be framed formally as a decentralized Markov decision problem. Many real-life distributed problems that arise in manufacturing, multi-robot coordination and information gathering scenarios can be formalized using this framework. However, finding the optimal solution in the general case is hard, limiting the applicability of recently developed algorithms. This paper provides a practical approach for solving decentralized control problems when communication among the decision makers is possible, but costly. We develop the notion of communication-based mechanism that allows us to decompose a decentralized MDP into multiple single-agent problems. In this framework, referred to as decentralized semi-Markov decision process with direct communication (Dec-SMDP-Com), agents operate separately between communications. We show that finding an optimal mechanism is equivalent to solving optimally a Dec-SMDP-Com. We also provide a heuristic search algorithm that converges on the optimal decomposition. Restricting the decomposition to some specific types of local behaviors reduces significantly the complexity of planning. In particular, we present a polynomial-time algorithm for the case in which individual agents perform goal-oriented behaviors between communications. The paper concludes with an additional tractable algorithm that enables the introduction of human knowledge, thereby reducing the overall problem to finding the best time to communicate. Empirical results show that these approaches provide good approximate solutions. |
Petrik, Marek; Zilberstein, Shlomo A Successive Approximation Algorithm for Coordination Problems Conference Proceedings of the 10th International Symposium on Artificial Intelligence and Mathematics (ISAIM), Ft. Lauderdale, Florida, 2008. @conference{SZ:PZisaim08, Developing scalable coordination algorithms for multi-agent systems is a hard computational challenge. One useful approach, demonstrated by the Coverage Set Algorithm (CSA), exploits structured interaction to produce significant computational gains. Empirically, CSA exhibits very good anytime performance, but an error bound on the results has not been established. We reformulate the algorithm and derive an online error bound for approximate solutions. Moreover, we propose an effective way to automatically reduce the complexity of the interaction. Our experiments show that this is a promising approach to solve a broad class of decentralized decision problems. The general formulation used by the algorithm makes it both easy to implement and widely applicable to a variety of other AI problems. |
Carlin, Alan; Zilberstein, Shlomo Value-Based Observation Compression for DEC-POMDPs Conference Proceedings of the 7th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Estoril, Portugal, 2008. @conference{SZ:CZaamas08, Representing agent policies compactly is essential for improving the scalability of multi-agent planning algorithms. In this paper, we focus on developing a pruning technique that allows us to merge certain observations within agent policies, while minimizing loss of value. This is particularly important for solving finite-horizon decentralized POMDPs, where agent policies are represented as trees, and where the size of policy trees grows exponentially with the number of observations. We introduce a value-based observation compression technique that prunes the least valuable observations while maintaining an error bound on the value lost as a result of pruning. We analyze the characteristics of this pruning strategy and show empirically that it is effective. Thus, we use compact policies to obtain significantly higher values compared with the best existing DEC-POMDP algorithm. |
Carlin, Alan; Zilberstein, Shlomo Observation Compression in DEC-POMDP Policy Trees Conference AAMAS Workshop on Multi-Agent Sequential Decision Making in Uncertain Domains (MSDM), Estoril, Portugal, 2008, (Best Paper Award). @conference{SZ:CZmsdm08, Representing agent policies compactly is essential for improving the scalability of multi-agent planning algorithms. In this paper, we focus on developing a pruning technique that allows us to merge certain observations from agent policies, while minimizing the loss of value. This is particularly important for solving finite-horizon decentralized POMDPs, where agent policies are represented as trees, and where the size of policy trees grows exponentially with the number of observations. We introduce a value-based observation compression technique that prunes the least valuable observations while maintaining an error bound on the value lost as a result of pruning. We analyze the characteristics of this pruning strategy and show empirically that it is effective. Thus, we use compact policies to obtain significantly higher values compared with the best existing DEC-POMDP algorithm. |
Amato, Christopher; Zilberstein, Shlomo What's Worth Memorizing: Attribute-based Planning for DEC-POMDPs Conference ICAPS Workshop on Multiagent Planning, Sydney, Australia, 2008. @conference{SZ:AZmasplan08, Current algorithms for decentralized partially observable Markov decision processes (DEC-POMDPs) require a large amount of memory to produce high quality plans. To combat this, existing methods optimize a set of finite-state controllers with an arbitrary amount of fixed memory. While this works well for some problems, in general, scalability and solution quality remain limited. As an alternative, we propose remembering some attributes that summarize key aspects of an agent's action and observation history. These attributes are often simple to determine, provide a well-motivated choice of controller size and focus the solution search on important components of agent histories. We show that for a range of DEC-POMDPs such attribute-based representation improves plan quality and scalability. |
Goldman, Claudia V; Allen, Martin; Zilberstein, Shlomo Learning to Communicate in a Decentralized Environment Journal Article In: Autonomous Agents and Multi-Agent Systems (JAAMAS), vol. 15, no. 1, pp. 47–90, 2007. @article{SZ:GAZjaamas07, Learning to communicate is an emerging challenge in AI research. It is known that agents interacting in decentralized, stochastic environments can benefit from exchanging information. Multi-agent planning generally assumes that agents share a common means of communication; however, in building robust distributed systems it is important to address potential miscoordination resulting from misinterpretation of messages exchanged. This paper lays foundations for studying this problem, examining its properties analytically and empirically in a decision-theoretic context. We establish a formal framework for the problem, and identify a collection of necessary and sufficient properties for decision problems that allow agents to employ probabilistic updating schemes in order to learn how to interpret what others are communicating. Solving the problem optimally is often intractable, but our approach enables agents using different languages to converge upon coordination over time. Our experimental work establishes how these methods perform when applied to problems of varying complexity. |
Seuken, Sven; Zilberstein, Shlomo Memory-Bounded Dynamic Programming for DEC-POMDPs Conference Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI), Hyderabad, India, 2007. @conference{SZ:SZijcai07, Decentralized decision making under uncertainty has been shown to be intractable when each agent has different partial information about the domain. Thus, improving the applicability and scalability of planning algorithms is an important challenge. We present the first memory-bounded dynamic programming algorithm for finite-horizon decentralized POMDPs. A set of heuristics is used to identify relevant points of the infinitely large belief space. Using these belief points, the algorithm successively selects the best joint policies for each horizon. The algorithm is extremely efficient, having linear time and space complexity with respect to the horizon length. Experimental results show that it can handle horizons that are multiple orders of magnitude larger than what was previously possible, while achieving the same or better solution quality. These results significantly increase the applicability of decentralized decision-making techniques. |
Amato, Christopher; Bernstein, Daniel S; Zilberstein, Shlomo Optimizing Memory-Bounded Controllers for Decentralized POMDPs Conference Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (UAI), Vancouver, British Columbia, 2007. @conference{SZ:ABZuai07, We present a memory-bounded optimization approach for solving infinite-horizon decentralized POMDPs. Policies for each agent are represented by stochastic finite state controllers. We formulate the problem of optimizing these policies as a nonlinear program, leveraging powerful existing nonlinear optimization techniques for solving the problem. While existing solvers only guarantee locally optimal solutions, we show that our formulation produces higher quality controllers than the state-of-the-art approach. We also incorporate a shared source of randomness in the form of a correlation device to further increase solution quality with only a limited increase in space and time. Our experimental results show that nonlinear optimization can be used to provide high quality, concise solutions to decentralized decision problems under uncertainty. |
Seuken, Sven; Zilberstein, Shlomo Improved Memory-Bounded Dynamic Programming for Decentralized POMDPs Conference Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (UAI), Vancouver, British Columbia, 2007. @conference{SZ:SZuai07, Memory-Bounded Dynamic Programming (MBDP) has proved extremely effective in solving decentralized POMDPs with large horizons. We generalize the algorithm and improve its scalability by reducing the complexity with respect to the number of observations from exponential to polynomial. We derive error bounds on solution quality with respect to this new approximation and analyze the convergence behavior. To evaluate the effectiveness of the improvements, we introduce a new, larger benchmark problem. Experimental results show that despite the high complexity of decentralized POMDPs, scalable solution techniques such as MBDP perform surprisingly well. |
Allen, Martin; Zilberstein, Shlomo Agent Influence as a Predictor of Difficulty for Decentralized Problem-Solving Conference Proceedings of the 22nd Conference on Artificial Intelligence (AAAI), Vancouver, British Columbia, 2007. @conference{SZ:AZaaai07, We study the effect of problem structure on the practical performance of optimal dynamic programming for decentralized decision problems. It is shown that restricting agent influence over problem dynamics can make the problem easier to solve. Experimental results establish that agent influence correlates with problem difficulty: as the gap between the influence of different agents grows, problems tend to become much easier to solve. The measure thus provides a general-purpose, automatic characterization of decentralized problems, identifying those for which optimal methods are more or less likely to work. Such a measure is also of possible use as a heuristic in the design of algorithms that create task decompositions and control hierarchies in order to simplify multiagent problems. |
Petrik, Marek; Zilberstein, Shlomo Anytime Coordination Using Separable Bilinear Programs Conference Proceedings of the 22nd Conference on Artificial Intelligence (AAAI), Vancouver, British Columbia, 2007. @conference{SZ:PZaaai07, Developing scalable coordination algorithms for multi-agent systems is a hard computational challenge. One useful approach, demonstrated by the Coverage Set Algorithm (CSA), exploits structured interaction to produce significant computational gains. Empirically, CSA exhibits very good anytime performance, but an error bound on the results has not been established. We reformulate the algorithm and derive both online and offline error bounds for approximate solutions. Moreover, we propose an effective way to automatically reduce the complexity of the interaction. Our experiments show that this is a promising approach to solve a broad class of decentralized decision problems. The general formulation used by the algorithm makes it both easy to implement and widely applicable to a variety of other AI problems. |
Szer, Daniel; Charpillet, Francois; Zilberstein, Shlomo MAA*: A Heuristic Search Algorithm for Solving Decentralized POMDPs Conference Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence (UAI), Edinburgh, Scotland, 2005. @conference{SZ:SCZuai05, We present multi-agent A* (MAA*), the first complete and optimal heuristic search algorithm for solving decentralized partially-observable Markov decision problems (DEC- POMDPs) with finite horizon. The algorithm is suitable for computing optimal plans for a cooperative group of agents that operate in a stochastic environment such as multi-robot coordination, network traffic control, or distributed resource allocation. Solving such problems effectively is a major challenge in the area of planning under uncertainty. Our solution is based on a synthesis of classical heuristic search and decentralized control theory. Experimental results show that MAA* has significant advantages. We introduce an anytime variant of MAA* and conclude with a discussion of promising extensions such as an approach to solving infinite-horizon problems. |
Allen, Martin; Goldman, Claudia V; Zilberstein, Shlomo Language Learning in Multi-Agent Systems Conference Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI), Edinburgh, Scotland, 2005. @conference{SZ:AGZijcai05, We present the problem of learning to communicate in decentralized and stochastic environments, analyzing it formally in a decision-theoretic context and illustrating the concept experimentally. Our approach allows agents to converge upon coordinated communication and action over time. |
Bernstein, Daniel S; Hansen, Eric A; Zilberstein, Shlomo Bounded Policy Iteration for Decentralized POMDPs Conference Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI), Edinburgh, Scotland, 2005. @conference{SZ:BHZijcai05, We present a bounded policy iteration algorithm for infinite-horizon decentralized POMDPs. Policies are represented as joint stochastic finite-state controllers, which consist of a local controller for each agent. We also let a joint controller include a correlation device that allows the agents to correlate their behavior without exchanging information during execution, and show that this leads to improved performance. The algorithm uses a fixed amount of memory, and each iteration is guaranteed to produce a controller with value at least as high as the previous one for all possible initial state distributions. For the case of a single agent, the algorithm reduces to Poupart and Boutilier's bounded policy iteration for POMDPs. |
Models of Bounded Rationality
Mahmud, Saaduddin; Nashed, Samer B.; Goldman, Claudia V.; Zilberstein, Shlomo Estimating Causal Responsibility for Explaining Autonomous Behavior Book Section In: Calvaresi, Davide (Ed.): International Workshop on Explainable and Transparent AI and Multi-Agent Systems (EXTRAAMAS), pp. 78–94, Springer, 2023. @incollection{SZ:MNGZextraamas23, There has been growing interest in causal explanations of stochastic, sequential decision-making systems. Structural causal models and causal reasoning offer several theoretical benefits when exact inference can be applied. Furthermore, users overwhelmingly prefer the resulting causal explanations over other state-of-the-art systems. In this work, we focus on one such method, MeanRESP, and its approximate versions that drastically reduce compute load and assign a responsibility score to each variable, which helps identify smaller sets of causes to be used as explanations. However, this method, and its approximate versions in particular, lack deeper theoretical analysis and broader empirical tests. To address these shortcomings, we provide three primary contributions. First, we offer several theoretical insights on the sample complexity and error rate of approximate MeanRESP. Second, we discuss several automated metrics for comparing explanations generated from approximate methods to those generated via exact methods. While we recognize the significance of user studies as the gold standard for evaluating explanations, our aim is to leverage the proposed metrics to systematically compare explanation-generation methods along important quantitative dimensions. Finally, we provide a more detailed discussion of MeanRESP and how its output under different definitions of responsibility compares to existing widely adopted methods that use Shapley values. |
Parr, Shane; Khatri, Ishan; Svegliato, Justin; Zilberstein, Shlomo Agent-Aware State Estimation in Autonomous Vehicles Conference Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 2021. @conference{SZ:PKSZiros21, Autonomous systems often operate in environments where the behavior of multiple agents is coordinated by a shared global state. Reliable estimation of the global state is thus critical for successfully operating in a multi-agent setting. We introduce agent-aware state estimation--a framework for calculating indirect estimations of state given observations of the behavior of other agents in the environment. We also introduce transition-independent agent-aware state estimation--a tractable class of agent-aware state estimation--and show that it allows the speed of inference to scale linearly with the number of agents in the environment. As an example, we model traffic light classification in instances of complete loss of direct observation. By taking into account observations of vehicular behavior from multiple directions of traffic, our approach exhibits accuracy higher than that of existing traffic light-only HMM methods on a real-world autonomous vehicle data set under a variety of simulated occlusion scenarios. |
Wu, Feng; Zilberstein, Shlomo; Jennings, Nicholas R Multi-Agent Planning with High-Level Human Guidance Conference Proceedings of Principles and Practice of Multi-Agent Systems (PRIMA), 2020. @conference{SZ:WZJprima20, Planning and coordination of multiple agents in the presence of uncertainty and noisy sensors is extremely hard. A human operator who observes a multi-agent team can provide valuable guidance to the team based on her superior ability to interpret observations and assess the overall situation. We propose an extension of decentralized POMDPs that allows such human guidance to be factored into the planning and execution processes. Human guidance in our framework consists of intuitive high-level commands that the agents must translate into a suitable joint plan that is sensitive to what they know from local observations. The result is a framework that allows multi-agent systems to benefit from the complex strategic thinking of a human supervising them. We evaluate this approach on several common benchmark problems and show that it can lead to dramatic improvement in performance. |
Wu, Feng; Zilberstein, Shlomo; Jennings, Nicholas R Stochastic Multi-agent Planning with Partial State Models Conference Proceedings of the First International Conference on Distributed Artificial Intelligence (DAI), Beijing, China, 2019. @conference{SZ:WZJdai19, People who observe a multi-agent team can often provide valuable information to the agents based on their superior cognitive abilities to interpret sequences of observations and assess the overall situation. The knowledge they possess is often difficult to be fully represented using a formal model such as DEC-POMDP. To deal with this, we propose an extension of the DEC-POMDP that allows states to be partially specified and benefit from expert knowledge, while preserving the partial observability and decentralized operation of the agents. In particular, we present an algorithm for computing policies based on history samples that include human labeled data in the form of reward reshaping. We also consider ways to minimize the burden on human experts during the labeling phase. The results offer the first approach to incorporating human knowledge in such complex multi-agent settings. We demonstrate the benefits of our approach using a disaster recovery scenario, comparing it to several baseline approaches. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Privacy-Preserving Policy Iteration for Decentralized POMDPs Conference Proceedings of the 32nd Conference on Artificial Intelligence (AAAI), New Orleans, Louisiana, 2018. @conference{SZ:WZCaaai18, We propose the first privacy-preserving approach to address the privacy issues that arise in multi-agent planning problems modeled as a Dec-POMDP. Our solution is a distributed message-passing algorithm based on trials, where the agents' policies are optimized using the cross-entropy method. In our algorithm, the agents' private information is protected using a public-key homomorphic cryptosystem. We prove the correctness of our algorithm and analyze its complexity in terms of message passing and encryption/decryption operations. Furthermore, we analyze several privacy aspects of our algorithm and show that it can preserve the agent privacy of non-neighbors, model privacy, and decision privacy. Our experimental results on several common Dec-POMDP bench- mark problems confirm the effectiveness of our approach. |
Wray, Kyle Hollins; Kumar, Akshat; Zilberstein, Shlomo Integrated Cooperation and Competition in Multi-Agent Decision-Making Conference Proceedings of the 32nd Conference on Artificial Intelligence (AAAI), New Orleans, Louisiana, 2018. @conference{SZ:WKZaaai18, Observing that many real-world sequential decision problems are not purely cooperative or purely competitive, we propose a new model--cooperative-competitive process (CCP)--that can simultaneously encapsulate both cooperation and competition. First, we discuss how the CCP model bridges the gap between cooperative and competitive models. Next, we investigate a specific class of group-dominant CCPs, in which agents cooperate to achieve a common goal as their primary objective, while also pursuing individual goals as a secondary objective. We provide an approximate solution for this class of problems that leverages stochastic finite-state controllers. The model is grounded in two multi-robot meeting and box-pushing domains that are implemented in simulation and demonstrated on two real robots. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Multi-Agent Planning with Baseline Regret Minimization Conference Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI), 2017. @conference{SZ:WZCijcai17, We propose a novel baseline regret minimization algorithm for multi-agent planning problems modeled as finite-horizon decentralized POMDPs. It guarantees to produce a policy that is provably at least as good as a given baseline policy. We also propose an iterative belief generation algorithm to efficiently minimize the baseline regret, which only requires necessary iterations so as to converge to the policy with minimum baseline regret. Experimental results on common benchmark problems confirm the benefits of the algorithm compared with the state-of-the-art approaches. |
Kumar, Akshat; Mostafa, Hala; Zilberstein, Shlomo Dual Formulations for Optimizing Dec-POMDP Controllers Conference Proceedings of the 26th International Conference on Automated Planning and Scheduling (ICAPS), London, UK, 2016. @conference{SZ:KMZicaps16, Decentralized POMDP is an expressive model for multiagent planning. Finite-state controllers (FSCs)--often used to represent policies for infinite-horizon problems---offer a compact, simple-to-execute policy representation. We exploit novel connections between optimizing decentralized FSCs and the dual linear program for MDPs. Consequently, we describe a dual mixed integer linear program (MIP) for optimizing deterministic FSCs. We exploit the Dec-POMDP structure to devise a compact MIP and formulate constraints that result in policies executable in partially-observable decentralized settings. We show analytically that the dual formulation can also be exploited within the expectation maximization (EM) framework to optimize stochastic FSCs. The resulting EM algorithm can be implemented by solving a sequence of linear programs, without requiring expensive message passing over the Dec-POMDP DBN. We also present an efficient technique for policy improvement based on a weighted entropy measure. Compared with state-of-the-art FSC methods, our approach offers over an order-of-magnitude speedup, while producing similar or better solutions. |
Kumar, Akshat; Zilberstein, Shlomo; Toussaint, Marc Probabilistic Inference Techniques for Scalable Multiagent Decision Making Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 53, pp. 223–270, 2015. @article{SZ:KZTjair15, Decentralized POMDPs provide an expressive framework for multiagent sequential decision making. However, the complexity of these models -- NEXP-Complete even for two agents -- has limited their scalability. We present a promising new class of approximation algorithms by developing novel connections between multiagent planning and machine learning. We show how the multiagent planning problem can be reformulated as inference in a mixture of dynamic Bayesian networks (DBNs). This planning-as-inference approach paves the way for the application of efficient inference techniques in DBNs to multiagent decision making. To further improve scalability, we identify certain conditions that are sufficient to extend the approach to multiagent systems with dozens of agents. Specifically, we show that the necessary inference within the expectation-maximization framework can be decomposed into processes that often involve a small subset of agents, thereby facilitating scalability. We further show that a number of existing multiagent planning models satisfy these conditions. Experiments on large planning benchmarks confirm the benefits of our approach in terms of runtime and scalability with respect to existing techniques. |
Nguyen, Duc Thien; Yeoh, William; Lau, Hoong Chuin; Zilberstein, Shlomo; Zhang, Chongjie Decentralized Multi-Agent Reinforcement Learning in Average-Reward Dynamic DCOPs Conference Proceedings of the 28th Conference on Artificial Intelligence (AAAI), Quebec City, Canada, 2014. @conference{SZ:NYLZZaaai14, Researchers have introduced the Dynamic Distributed Constraint Optimization Problem (Dynamic DCOP) formulation to model dynamically changing multi-agent coordination problems, where a dynamic DCOP is a sequence of (static canonical) DCOPs, each partially different from the DCOP preceding it. Existing work typically assumes that the problem in each time step is decoupled from the problems in other time steps, which might not hold in some applications. Therefore, in this paper, we make the following contributions: (i) We introduce a new model, called Markovian Dynamic DCOPs (MD-DCOPs), where the DCOP in the next time step is a function of the value assignments in the current time step; (ii) We introduce two distributed reinforcement learning algorithms, the Distributed RVI Q-learning algorithm and the Distributed R-learning algorithm, that balance exploration and exploitation to solve MD-DCOPs in an online manner; and (iii) We empirically evaluate them against an existing multiarm bandit DCOP algorithm on dynamic DCOPs. |
Brafman, Ronen I; Shani, Guy; Zilberstein, Shlomo Qualitative Planning under Partial Observability in Multi-Agent Domains Conference Proceedings of the 27th Conference on Artificial Intelligence (AAAI), Bellevue, Washington, 2013. @conference{SZ:BSZaaai13, Decentralized POMDPs (Dec-POMDPs) provide a rich, attractive model for planning under uncertainty and partial observability in cooperative multi-agent domains with a growing body of research. In this paper we formulate a qualitative, propositional model for multi-agent planning under uncertainty with partial observability, which we call Qualitative Dec-POMDP (QDec-POMDP). We show that the worst-case complexity of planning in QDec-POMDPs is similar to that of Dec-POMDPs. Still, because the model is more "classical" in nature, it is more compact and easier to specify. Furthermore, it eases the adaptation of methods used in classical and contingent planning to solve problems that challenge current Dec-POMDPs solvers. In particular, in this paper we describe a method based on compilation to classical planning, which handles multi-agent planning problems significantly larger than those handled by current Dec-POMDP algorithms. |
Wu, Feng; Zilberstein, Shlomo; Jennings, Nicholas R Monte-Carlo Expectation Maximization for Decentralized POMDPs Conference Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI), Beijing, China, 2013. @conference{SZ:WZJijcai13, We address two significant drawbacks of state-of-the-art solvers of decentralized POMDPs (DEC-POMDPs): the reliance on complete knowledge of the model and limited scalability as the complexity of the domain grows. We extend a recently proposed approach for solving DEC-POMDPs via a reduction to the maximum likelihood problem, which in turn can be solved using EM. We introduce a model-free version of this approach that employs Monte-Carlo EM (MCEM). While a naive implementation of MCEM is inadequate in multi-agent settings, we introduce several improvements in sampling that produce high-quality results on a variety of DEC-POMDP benchmarks, including large problems with thousands of agents. |
Yeoh, William; Kumar, Akshat; Zilberstein, Shlomo Automated Generation of Interaction Graphs for Value-Factored Dec-POMDPs Conference Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI), Beijing, China, 2013. @conference{SZ:YKZijcai13, The Decentralized Partially Observable Markov Decision Process (Dec-POMDP) is a powerful model for multiagent planning under uncertainty, but its applicability is hindered by its high complexity -- solving Dec-POMDPs optimally is NEXP-hard. Recently, Kumar et al. introduced the Value Factorization (VF) framework, which exploits decomposable value functions that can be factored into subfunctions. This framework has been shown to be a generalization of several models that leverage sparse agent interactions such as TI-Dec-MDPs, ND-POMDPs and TD-POMDPs. Existing algorithms for these models assume that the interaction graph of the problem is given. In this paper, we introduce three algorithms to automatically generate interaction graphs for models within the VF framework and establish lower and upper bounds on the expected reward of an optimal joint policy. We illustrate experimentally the benefits of these techniques for sensor placement in a decentralized tracking application. |
Durfee, Edmund; Zilberstein, Shlomo Multiagent Planning, Control, and Execution Book Section In: Weiss, G (Ed.): Multiagent Systems, Second Edition, pp. 485–546, MIT Press, Cambridge, MA, USA, 2013. @incollection{SZ:DZmultiagent13, |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Online Planning for Multi-Agent Systems with Bounded Communication Journal Article In: Artificial Intelligence (AIJ), vol. 175, no. 2, pp. 487–511, 2011. @article{SZ:WZCaij11, We propose an online algorithm for planning under uncertainty in multi-agent settings modeled as DEC-POMDPs. The algorithm helps overcome the high computational complexity of solving such problems offline. The key challenges in decentralized operation are to maintain coordinated behavior with little or no communication and, when communication is allowed, to optimize value with minimal communication. The algorithm addresses these challenges by generating identical conditional plans based on common knowledge and communicating only when history inconsistency is detected, allowing communication to be postponed when necessary. To be suitable for online operation, the algorithm computes good local policies using a new and fast local search method implemented using linear programming. Moreover, it bounds the amount of memory used at each step and can be applied to problems with arbitrary horizons. The experimental results confirm that the algorithm can solve problems that are too large for the best existing offline planning algorithms and it outperforms the best online method, producing much higher value with much less communication in most cases. The algorithm also proves to be effective when the communication channel is imperfect (periodically unavailable). These results contribute to the scalability of decision-theoretic planning in multi-agent settings. |
Kumar, Akshat; Zilberstein, Shlomo Message-Passing Algorithms for Large Structured Decentralized POMDPs Conference Proceedings of the 10th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Taipei, Taiwan, 2011. @conference{SZ:KZaamas11, Anytime algorithms allow a system to trade solution quality for computation time. In previous work, monitoring techniques have been developed to allow agents to stop the computation at the "right" time so as to optimize a given time-dependent utility function. However, these results apply only to the single-agent case. In this paper we analyze the problems that arise when several agents solve components of a larger problem, each using an anytime algorithm. Monitoring in this case is more challenging as each agent is uncertain about the progress made so far by the others. We develop a formal framework for decentralized monitoring, establish the complexity of several interesting variants of the problem, and propose solution techniques for each one. Finally, we show that the framework can be applied to decentralized flow and planning problems. |
Kumar, Akshat; Zilberstein, Shlomo; Toussaint, Marc Scalable Multiagent Planning Using Probabilistic Inference Conference Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI), Barcelona, Spain, 2011. @conference{SZ:KZTijcai11, Multiagent planning has seen much progress with the development of formal models such as Dec-POMDPs. However, the complexity of these models -- NEXP-Complete even for two agents -- has limited scalability. We identify certain mild conditions that are sufficient to make multiagent planning amenable to a scalable approximation w.r.t. the number of agents. This is achieved by constructing a graphical model in which likelihood maximization is equivalent to plan optimization. Using the Expectation-Maximization framework for likelihood maximization, we show that the necessary inference can be decomposed into processes that often involve a small subset of agents, thereby facilitating scalability. We derive a global update rule that combines these local inferences to monotonically increase the overall solution quality. Experiments on a large multiagent planning benchmark confirm the benefits of the new approach in terms of runtime and scalability. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Online Planning for Ad Hoc Autonomous Agent Teams Conference Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI), Barcelona, Spain, 2011. @conference{SZ:WZCijcai11, We propose a novel online planning algorithm for ad hoc team settings -- challenging situations in which an agent must collaborate with unknown teammates without prior coordination. Our approach is based on constructing and solving a series of stage games, and then using biased adaptive play to choose actions. The utility function in each stage game is estimated via Monte-Carlo tree search using the UCT algorithm. We establish analytically the convergence of the algorithm and show that it performs well in a variety of ad hoc team domains. |
Amato, Christopher; Bernstein, Daniel S; Zilberstein, Shlomo Optimizing Fixed-Size Stochastic Controllers for POMDPs and Decentralized POMDPs Journal Article In: Autonomous Agents and Multi-Agent Systems (JAAMAS), vol. 21, no. 3, pp. 293–320, 2010. @article{SZ:ABZjaamas10, Coordination of distributed agents is required for problems arising in many areas, including multi-robot systems, networking and e-commerce. As a formal framework for such problems, we use the decentralized partially observable Markov decision process (DEC-POMDP). Though much work has been done on optimal dynamic programming algorithms for the single-agent version of the problem, optimal algorithms for the multiagent case have been elusive. The main contribution of this paper is an optimal policy iteration algorithm for solving DEC-POMDPs. The algorithm uses stochastic finite-state controllers to represent policies. The solution can include a correlation device, which allows agents to correlate their actions without communicating. This approach alternates between expanding the controller and performing value-preserving transformations, which modify the controller without sacrificing value. We present two Efficient value-preserving transformations: one can reduce the size of the controller and the other can improve its value while keeping the size fixed. Empirical results demonstrate the usefulness of value-preserving transformations in increasing value while keeping controller size to a minimum. To broaden the applicability of the approach, we also present a heuristic version of the policy iteration algorithm, which sacrifices convergence to optimality. This algorithm further reduces the size of the controllers at each step by assuming that probability distributions over the other agents' actions are known. While this assumption may not hold in general, it helps produce higher quality solutions in our test problems. |
Kumar, Akshat; Zilberstein, Shlomo Point-Based Backup for Decentralized POMDPs: Complexity and New Algorithms Conference Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Toronto, Canada, 2010. @conference{SZ:KZaamas10, Decentralized POMDPs provide an expressive framework for sequential multi-agent decision making. Despite their high complexity, there has been significant progress in scaling up existing algorithms, largely due to the use of point-based methods. Performing point-based backup is a fundamental operation in state-of-the-art algorithms. We show that even a single backup step in the multi-agent setting is NP-Complete. Despite this negative worst-case result, we present an efficient and scalable optimal algorithm as well as a principled approximation scheme. The optimal algorithm exploits recent advances in the weighted CSP literature to overcome the complexity of the backup operation. The polytime approximation scheme provides a constant factor approximation guarantee based on the number of belief points. In experiments on standard domains, the optimal approach provides significant speedup (up to 2 orders of magnitude) over the previous best optimal algorithm and is able to increase the number of belief points by more than a factor of 3. The approximation scheme also works well in practice, providing near-optimal solutions to the backup problem. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Point-Based Policy Generation for Decentralized POMDPs Conference Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Toronto, Canada, 2010. @conference{SZ:WZCaamas10, Memory-bounded techniques have shown great promise in solving complex multi-agent planning problems modeled as DEC-POMDPs. Much of the performance gains can be attributed to pruning techniques that alleviate the complexity of the exhaustive backup step of the original MBDP algorithm. Despite these improvements, state-of-the-art algorithms can still handle a relative small pool of candidate policies, which limits the quality of the solution in some benchmark problems. We present a new algorithm, Point-Based Policy Generation, which avoids altogether searching the entire joint policy space. The key observation is that the best joint policy for each reachable belief state can be constructed directly, instead of producing first a large set of candidates. We also provide an efficient approximate implementation of this operation. The experimental results show that our solution technique improves the performance significantly in terms of both runtime and solution quality. |
Kumar, Akshat; Zilberstein, Shlomo Anytime Planning for Decentralized POMDPs using Expectation Maximization Conference Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI), Catalina Island, California, 2010. @conference{SZ:KZuai10, Decentralized POMDPs provide an expressive framework for multi-agent sequential decision making. While finite-horizon DEC-POMDPs have enjoyed significant success, progress remains slow for the infinite-horizon case mainly due to the inherent complexity of optimizing stochastic controllers representing agent policies. We present a promising new class of algorithms for the infinite-horizon case, which recasts the optimization problem as inference in a mixture of DBNs. An attractive feature of this approach is the straightforward adoption of existing inference techniques in DBNs for solving DEC-POMDPs and supporting richer representations such as factored or continuous states and actions. We also derive the Expectation Maximization (EM) algorithm to optimize the joint policy represented as DBNs. Experiments on benchmark domains show that EM compares favorably against the state-of-the-art solvers. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Rollout Sampling Policy Iteration for Decentralized POMDPs Conference Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI), Catalina Island, California, 2010. @conference{SZ:WZCuai10, We present decentralized rollout sampling policy iteration (DecRSPI)--a new algorithm for multiagent decision problems formalized as DEC-POMDPs. DecRSPI is designed to improve scalability and tackle problems that lack an explicit model. The algorithm uses Monte-Carlo methods to generate a sample of reachable belief states. Then it computes a joint policy for each belief state based on the rollout estimations. A new policy representation allows us to represent solutions compactly. The key benefits of the algorithm are its linear time complexity over the number of agents, its bounded memory usage and good solution quality. It can solve larger problems that are intractable for existing planning algorithms. Experimental results confirm the effectiveness and scalability of the approach. |
Amato, Christopher; Bonet, Blai; Zilberstein, Shlomo Finite-State Controllers Based on Mealy Machines for Centralized and Decentralized POMDPs Conference Proceedings of the 24th Conference on Artificial Intelligence (AAAI), Atlanta, Georgia, 2010. @conference{SZ:ABZaaai10, Existing controller-based approaches for centralized and decentralized POMDPs are based on automata with output known as Moore machines. In this paper, we show that several advantages can be gained by utilizing another type of automata, the Mealy machine. Mealy machines are more powerful than Moore machines, provide a richer structure that can be exploited by solution methods, and can be easily incorporated into current controller-based approaches. To demonstrate this, we adapted some existing controller-based algorithms to use Mealy machines and obtained results on a set of benchmark domains. The Mealy-based approach always outperformed the Moore-based approach and often outperformed the state-of-the-art algorithms for both centralized and decentralized POMDPs. These findings provide fresh and general insights for the improvement of existing algorithms and the development of new ones. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Trial-Based Dynamic Programming for Multi-Agent Planning Conference Proceedings of the 24th Conference on Artificial Intelligence (AAAI), Atlanta, Georgia, 2010. @conference{SZ:WZCaaai10, Trial-based approaches offer an efficient way to solve single-agent MDPs and POMDPs. These approaches allow agents to focus their computations on regions of the environment they encounter during the trials, leading to significant computational savings. We present a novel trial-based dynamic programming (TBDP) algorithm for DEC-POMDPs that extends these benefits to multi-agent settings. The algorithm uses trial-based methods for both belief generation and policy evaluation. Policy improvement is implemented efficiently using linear programming and a sub-policy reuse technique that helps bound the amount of memory. The results show that TBDP can produce significant value improvements and is much faster than the best existing planning algorithms. |
Bernstein, Daniel S; Amato, Christopher; Hansen, Eric A; Zilberstein, Shlomo Policy Iteration for Decentralized Control of Markov Decision Processes Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 34, pp. 89–132, 2009. @article{SZ:BAHZjair09, Coordination of distributed agents is required for problems arising in many areas, including multi-robot systems, networking and e-commerce. As a formal framework for such problems, we use the decentralized partially observable Markov decision process (DEC-POMDP). Though much work has been done on optimal dynamic programming algorithms for the single-agent version of the problem, optimal algorithms for the multiagent case have been elusive. The main contribution of this paper is an optimal policy iteration algorithm for solving DEC-POMDPs. The algorithm uses stochastic finite-state controllers to represent policies. The solution can include a correlation device, which allows agents to correlate their actions without communicating. This approach alternates between expanding the controller and performing value-preserving transformations, which modify the controller without sacrificing value. We present two Efficient value-preserving transformations: one can reduce the size of the controller and the other can improve its value while keeping the size fixed. Empirical results demonstrate the usefulness of value-preserving transformations in increasing value while keeping controller size to a minimum. To broaden the applicability of the approach, we also present a heuristic version of the policy iteration algorithm, which sacrifices convergence to optimality. This algorithm further reduces the size of the controllers at each step by assuming that probability distributions over the other agents' actions are known. While this assumption may not hold in general, it helps produce higher quality solutions in our test problems. |
Petrik, Marek; Zilberstein, Shlomo A Bilinear Programming Approach for Multiagent Planning Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 35, pp. 235–274, 2009. @article{SZ:PZjair09, Multiagent planning and coordination problems are common and known to be computationally hard. We show that a wide range of two-agent problems can be formulated as bilinear programs. We present a successive approximation algorithm that significantly outperforms the coverage set algorithm, which is the state-of-the-art method for this class of multiagent problems. Because the algorithm is formulated for bilinear programs, it is more general and simpler to implement. The new algorithm can be terminated at any time and--unlike the coverage set algorithm--it facilitates the derivation of a useful online performance bound. It is also much more efficient, on average reducing the computation time of the optimal solution by about four orders of magnitude. Finally, we introduce an automatic dimensionality reduction method that improves the effectiveness of the algorithm, extending its applicability to new domains and providing a new way to analyze a subclass of bilinear programs. |
Becker, Raphen; Carlin, Alan; Lesser, Victor; Zilberstein, Shlomo Analyzing Myopic Approaches for Multi-Agent Communication Journal Article In: Computational Intelligence, vol. 25, no. 1, pp. 31–50, 2009. @article{SZ:BCLZci09, Choosing when to communicate is a fundamental problem in multi-agent systems. This problem becomes particularly challenging when communication is constrained and each agent has different partial information about the overall situation. We take a decision-theoretic approach to this problem that balances the benefits of communication against the costs. Although computing the exact value of communication is intractable, it can be estimated using a standard myopic assumption--that communication is only possible at the present time. We examine specific situations in which this assumption leads to poor performance and demonstrate an alternative approach that relaxes the assumption and improves performance. The results provide an effective method for value-driven communication policies in multi-agent systems. |
Amato, Christopher; Zilberstein, Shlomo Achieving Goals in Decentralized POMDPs Conference Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Budapest, Hungary, 2009. @conference{ASZ:Zaamas09, Coordination of multiple agents under uncertainty in the decentralized POMDP model is known to be NEXP-complete, even when the agents have a joint set of goals. Nevertheless, we show that the existence of goals can help develop effective planning algorithms. We examine an approach to model these problems as indefinite-horizon decentralized POMDPs, suitable for many practical problems that terminate after some unspecified number of steps. Our algorithm for solving these problems is optimal under some common assumptions--that terminal actions exist for each agent and rewards for non-terminal actions are negative. We also propose an infinite-horizon approximation method that allows us to relax these assumptions while maintaining goal conditions. An optimality bound is developed for this sample-based approach and experimental results show that it is able to exploit the goal structure effectively. Compared with the state-of-the-art, our approach can solve larger problems and produce significantly better solutions. |
Kumar, Akshat; Zilberstein, Shlomo Constraint-Based Dynamic Programming for Decentralized POMDPs with Structured Interactions Conference Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Budapest, Hungary, 2009. @conference{SZ:KZaamas09, Decentralized partially observable MDPs (DEC-POMDPs) provide a rich framework for modeling decision making by a team of agents. Despite rapid progress in this area, the limited scalability of solution techniques has restricted the applicability of the model. To overcome this computational barrier, research has focused on restricted classes of DEC-POMDPs, which are easier to solve yet rich enough to capture many practical problems. We present CBDP, an efficient and scalable point-based dynamic programming algorithm for one such model called ND-POMDP (Network Distributed POMDP). Specifically, CBDP provides magnitudes of speedup in the policy computation and generates better quality solution for all test instances. It has linear complexity in the number of agents and horizon length. Furthermore, the complexity per horizon for the examined class of problems is exponential only in a small parameter that depends upon the interaction among the agents, achieving significant scalability for large, loosely coupled multi-agent systems. The efficiency of CBDP lies in exploiting the structure of interactions using constraint networks. These results extend significantly the effectiveness of decision-theoretic planning in multi-agent settings. |
Kumar, Akshat; Zilberstein, Shlomo Dynamic Programming Approximations for Partially Observable Stochastic Games Conference Proceedings of the 22nd International FLAIRS Conference, Sanibel Island, Florida, 2009. @conference{SZ:KZflairs09, Partially observable stochastic games (POSGs) provide a rich mathematical framework for planning under uncertainty by a group of agents. However, this modeling advantage comes with a price, namely a high computational cost. Solving POSGs optimally quickly becomes intractable after a few decision cycles. Our main contribution is to provide bounded approximation techniques, which enable us to scale POSG algorithms by several orders of magnitude. We study both the POSG model and its cooperative counterpart, DEC-POMDP. Experiments on a number of problems confirm the scalability of our approach while still providing useful policies. |
Kumar, Akshat; Zilberstein, Shlomo Event-Detecting Multi-Agent MDPs: Complexity and Constant-Factor Approximation Conference Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI), Pasadena, California, 2009. @conference{SZ:KZijcai09, Planning under uncertainty for multiple agents has grown rapidly with the development of formal models such as multi-agent MDPs and decentralized MDPs. But despite their richness, the applicability of these models remains limited due to their computational complexity. We present the class of event-detecting multi-agent MDPs (eMMDPs), designed to detect multiple mobile targets by a team of sensor agents. We show that eMMDPs are NP-Hard and present a scalable 2-approximation algorithm for solving them using matroid theory and constraint optimization. The complexity of the algorithm is linear in the state-space and number of agents, quadratic in the horizon, and exponential only in a small parameter that depends on the interaction among the agents. Despite the worst-case approximation ratio of 2, experimental results show that the algorithm produces near-optimal policies for a range of test problems. |
Amato, Christopher; Dibangoye, Jilles Steeve; Zilberstein, Shlomo Incremental Policy Generation for Finite-Horizon DEC-POMDPs Conference Proceedings of the 19th International Conference on Automated Planning and Scheduling (ICAPS), Thessaloniki, Greece, 2009. @conference{SZ:ADZicaps09, Decentralized partially observable MDPs (DEC-POMDPs) provide a rich framework for modeling decision making by a team of agents. Despite rapid progress in this area, the limited scalability of solution techniques has restricted the applicability of the model. To overcome this computational barrier, research has focused on restricted classes of DEC-POMDPs, which are easier to solve yet rich enough to capture many practical problems. We present CBDP, an efficient and scalable point-based dynamic programming algorithm for one such model called ND-POMDP (Network Distributed POMDP). Specifically, CBDP provides magnitudes of speedup in the policy computation and generates better quality solution for all test instances. It has linear complexity in the number of agents and horizon length. Furthermore, the complexity per horizon for the examined class of problems is exponential only in a small parameter that depends upon the interaction among the agents, achieving significant scalability for large, loosely coupled multi-agent systems. The efficiency of CBDP lies in exploiting the structure of interactions using constraint networks. These results extend significantly the effectiveness of decision-theoretic planning in multi-agent settings. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Multi-Agent Online Planning with Communication Conference Proceedings of the 19th International Conference on Automated Planning and Scheduling (ICAPS), Thessaloniki, Greece, 2009. @conference{SZ:WZCicaps09, We propose an online algorithm for planning under uncertainty in multi-agent settings modeled as DEC-POMDPs. The algorithm helps overcome the high computational complexity of solving such problems off-line. The key challenge is to produce coordinated behavior using little or no communication. When communication is allowed but constrained, the challenge is to produce high value with minimal communication. The algorithm addresses these challenges by communicating only when history inconsistency is detected, allowing communication to be postponed if necessary. Moreover, it bounds the memory usage at each step and can be applied to problems with arbitrary horizons. The experimental results confirm that the algorithm can solve problems that are too large for the best existing off-line planning algorithms and it outperforms the best online method, producing higher value with much less communication in most cases. |
Allen, Martin; Zilberstein, Shlomo Complexity of Decentralized Control: Special Cases Conference Proceedings of the 23rd Neural Information Processing Systems Conference (NIPS), Vancouver, British Columbia, Canada, 2009. @conference{SZ:AZnips09, The worst-case complexity of general decentralized POMDPs, which are equivalent to partially observable stochastic games (POSGs) is very high, both for the cooperative and competitive cases. Some reductions in complexity have been achieved by exploiting independence relations in some models. We show that these results are somewhat limited: when these independence assumptions are relaxed in very small ways, complexity returns to that of the general case. |
Goldman, Claudia V; Zilberstein, Shlomo Communication-Based Decomposition Mechanisms for Decentralized MDPs Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 32, pp. 169–202, 2008. @article{SZ:GZjair08, Multi-agent planning in stochastic environments can be framed formally as a decentralized Markov decision problem. Many real-life distributed problems that arise in manufacturing, multi-robot coordination and information gathering scenarios can be formalized using this framework. However, finding the optimal solution in the general case is hard, limiting the applicability of recently developed algorithms. This paper provides a practical approach for solving decentralized control problems when communication among the decision makers is possible, but costly. We develop the notion of communication-based mechanism that allows us to decompose a decentralized MDP into multiple single-agent problems. In this framework, referred to as decentralized semi-Markov decision process with direct communication (Dec-SMDP-Com), agents operate separately between communications. We show that finding an optimal mechanism is equivalent to solving optimally a Dec-SMDP-Com. We also provide a heuristic search algorithm that converges on the optimal decomposition. Restricting the decomposition to some specific types of local behaviors reduces significantly the complexity of planning. In particular, we present a polynomial-time algorithm for the case in which individual agents perform goal-oriented behaviors between communications. The paper concludes with an additional tractable algorithm that enables the introduction of human knowledge, thereby reducing the overall problem to finding the best time to communicate. Empirical results show that these approaches provide good approximate solutions. |
Seuken, Sven; Zilberstein, Shlomo Formal Models and Algorithms for Decentralized Decision Making under Uncertainty Journal Article In: Autonomous Agents and Multi-Agent Systems (JAAMAS), vol. 17, no. 2, pp. 190–250, 2008. @article{SZ:SZjaamas08, Multi-agent planning in stochastic environments can be framed formally as a decentralized Markov decision problem. Many real-life distributed problems that arise in manufacturing, multi-robot coordination and information gathering scenarios can be formalized using this framework. However, finding the optimal solution in the general case is hard, limiting the applicability of recently developed algorithms. This paper provides a practical approach for solving decentralized control problems when communication among the decision makers is possible, but costly. We develop the notion of communication-based mechanism that allows us to decompose a decentralized MDP into multiple single-agent problems. In this framework, referred to as decentralized semi-Markov decision process with direct communication (Dec-SMDP-Com), agents operate separately between communications. We show that finding an optimal mechanism is equivalent to solving optimally a Dec-SMDP-Com. We also provide a heuristic search algorithm that converges on the optimal decomposition. Restricting the decomposition to some specific types of local behaviors reduces significantly the complexity of planning. In particular, we present a polynomial-time algorithm for the case in which individual agents perform goal-oriented behaviors between communications. The paper concludes with an additional tractable algorithm that enables the introduction of human knowledge, thereby reducing the overall problem to finding the best time to communicate. Empirical results show that these approaches provide good approximate solutions. |
Petrik, Marek; Zilberstein, Shlomo A Successive Approximation Algorithm for Coordination Problems Conference Proceedings of the 10th International Symposium on Artificial Intelligence and Mathematics (ISAIM), Ft. Lauderdale, Florida, 2008. @conference{SZ:PZisaim08, Developing scalable coordination algorithms for multi-agent systems is a hard computational challenge. One useful approach, demonstrated by the Coverage Set Algorithm (CSA), exploits structured interaction to produce significant computational gains. Empirically, CSA exhibits very good anytime performance, but an error bound on the results has not been established. We reformulate the algorithm and derive an online error bound for approximate solutions. Moreover, we propose an effective way to automatically reduce the complexity of the interaction. Our experiments show that this is a promising approach to solve a broad class of decentralized decision problems. The general formulation used by the algorithm makes it both easy to implement and widely applicable to a variety of other AI problems. |
Carlin, Alan; Zilberstein, Shlomo Value-Based Observation Compression for DEC-POMDPs Conference Proceedings of the 7th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Estoril, Portugal, 2008. @conference{SZ:CZaamas08, Representing agent policies compactly is essential for improving the scalability of multi-agent planning algorithms. In this paper, we focus on developing a pruning technique that allows us to merge certain observations within agent policies, while minimizing loss of value. This is particularly important for solving finite-horizon decentralized POMDPs, where agent policies are represented as trees, and where the size of policy trees grows exponentially with the number of observations. We introduce a value-based observation compression technique that prunes the least valuable observations while maintaining an error bound on the value lost as a result of pruning. We analyze the characteristics of this pruning strategy and show empirically that it is effective. Thus, we use compact policies to obtain significantly higher values compared with the best existing DEC-POMDP algorithm. |
Carlin, Alan; Zilberstein, Shlomo Observation Compression in DEC-POMDP Policy Trees Conference AAMAS Workshop on Multi-Agent Sequential Decision Making in Uncertain Domains (MSDM), Estoril, Portugal, 2008, (Best Paper Award). @conference{SZ:CZmsdm08, Representing agent policies compactly is essential for improving the scalability of multi-agent planning algorithms. In this paper, we focus on developing a pruning technique that allows us to merge certain observations from agent policies, while minimizing the loss of value. This is particularly important for solving finite-horizon decentralized POMDPs, where agent policies are represented as trees, and where the size of policy trees grows exponentially with the number of observations. We introduce a value-based observation compression technique that prunes the least valuable observations while maintaining an error bound on the value lost as a result of pruning. We analyze the characteristics of this pruning strategy and show empirically that it is effective. Thus, we use compact policies to obtain significantly higher values compared with the best existing DEC-POMDP algorithm. |
Amato, Christopher; Zilberstein, Shlomo What's Worth Memorizing: Attribute-based Planning for DEC-POMDPs Conference ICAPS Workshop on Multiagent Planning, Sydney, Australia, 2008. @conference{SZ:AZmasplan08, Current algorithms for decentralized partially observable Markov decision processes (DEC-POMDPs) require a large amount of memory to produce high quality plans. To combat this, existing methods optimize a set of finite-state controllers with an arbitrary amount of fixed memory. While this works well for some problems, in general, scalability and solution quality remain limited. As an alternative, we propose remembering some attributes that summarize key aspects of an agent's action and observation history. These attributes are often simple to determine, provide a well-motivated choice of controller size and focus the solution search on important components of agent histories. We show that for a range of DEC-POMDPs such attribute-based representation improves plan quality and scalability. |
Goldman, Claudia V; Allen, Martin; Zilberstein, Shlomo Learning to Communicate in a Decentralized Environment Journal Article In: Autonomous Agents and Multi-Agent Systems (JAAMAS), vol. 15, no. 1, pp. 47–90, 2007. @article{SZ:GAZjaamas07, Learning to communicate is an emerging challenge in AI research. It is known that agents interacting in decentralized, stochastic environments can benefit from exchanging information. Multi-agent planning generally assumes that agents share a common means of communication; however, in building robust distributed systems it is important to address potential miscoordination resulting from misinterpretation of messages exchanged. This paper lays foundations for studying this problem, examining its properties analytically and empirically in a decision-theoretic context. We establish a formal framework for the problem, and identify a collection of necessary and sufficient properties for decision problems that allow agents to employ probabilistic updating schemes in order to learn how to interpret what others are communicating. Solving the problem optimally is often intractable, but our approach enables agents using different languages to converge upon coordination over time. Our experimental work establishes how these methods perform when applied to problems of varying complexity. |
Seuken, Sven; Zilberstein, Shlomo Memory-Bounded Dynamic Programming for DEC-POMDPs Conference Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI), Hyderabad, India, 2007. @conference{SZ:SZijcai07, Decentralized decision making under uncertainty has been shown to be intractable when each agent has different partial information about the domain. Thus, improving the applicability and scalability of planning algorithms is an important challenge. We present the first memory-bounded dynamic programming algorithm for finite-horizon decentralized POMDPs. A set of heuristics is used to identify relevant points of the infinitely large belief space. Using these belief points, the algorithm successively selects the best joint policies for each horizon. The algorithm is extremely efficient, having linear time and space complexity with respect to the horizon length. Experimental results show that it can handle horizons that are multiple orders of magnitude larger than what was previously possible, while achieving the same or better solution quality. These results significantly increase the applicability of decentralized decision-making techniques. |
Amato, Christopher; Bernstein, Daniel S; Zilberstein, Shlomo Optimizing Memory-Bounded Controllers for Decentralized POMDPs Conference Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (UAI), Vancouver, British Columbia, 2007. @conference{SZ:ABZuai07, We present a memory-bounded optimization approach for solving infinite-horizon decentralized POMDPs. Policies for each agent are represented by stochastic finite state controllers. We formulate the problem of optimizing these policies as a nonlinear program, leveraging powerful existing nonlinear optimization techniques for solving the problem. While existing solvers only guarantee locally optimal solutions, we show that our formulation produces higher quality controllers than the state-of-the-art approach. We also incorporate a shared source of randomness in the form of a correlation device to further increase solution quality with only a limited increase in space and time. Our experimental results show that nonlinear optimization can be used to provide high quality, concise solutions to decentralized decision problems under uncertainty. |
Seuken, Sven; Zilberstein, Shlomo Improved Memory-Bounded Dynamic Programming for Decentralized POMDPs Conference Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (UAI), Vancouver, British Columbia, 2007. @conference{SZ:SZuai07, Memory-Bounded Dynamic Programming (MBDP) has proved extremely effective in solving decentralized POMDPs with large horizons. We generalize the algorithm and improve its scalability by reducing the complexity with respect to the number of observations from exponential to polynomial. We derive error bounds on solution quality with respect to this new approximation and analyze the convergence behavior. To evaluate the effectiveness of the improvements, we introduce a new, larger benchmark problem. Experimental results show that despite the high complexity of decentralized POMDPs, scalable solution techniques such as MBDP perform surprisingly well. |
Allen, Martin; Zilberstein, Shlomo Agent Influence as a Predictor of Difficulty for Decentralized Problem-Solving Conference Proceedings of the 22nd Conference on Artificial Intelligence (AAAI), Vancouver, British Columbia, 2007. @conference{SZ:AZaaai07, We study the effect of problem structure on the practical performance of optimal dynamic programming for decentralized decision problems. It is shown that restricting agent influence over problem dynamics can make the problem easier to solve. Experimental results establish that agent influence correlates with problem difficulty: as the gap between the influence of different agents grows, problems tend to become much easier to solve. The measure thus provides a general-purpose, automatic characterization of decentralized problems, identifying those for which optimal methods are more or less likely to work. Such a measure is also of possible use as a heuristic in the design of algorithms that create task decompositions and control hierarchies in order to simplify multiagent problems. |
Petrik, Marek; Zilberstein, Shlomo Anytime Coordination Using Separable Bilinear Programs Conference Proceedings of the 22nd Conference on Artificial Intelligence (AAAI), Vancouver, British Columbia, 2007. @conference{SZ:PZaaai07, Developing scalable coordination algorithms for multi-agent systems is a hard computational challenge. One useful approach, demonstrated by the Coverage Set Algorithm (CSA), exploits structured interaction to produce significant computational gains. Empirically, CSA exhibits very good anytime performance, but an error bound on the results has not been established. We reformulate the algorithm and derive both online and offline error bounds for approximate solutions. Moreover, we propose an effective way to automatically reduce the complexity of the interaction. Our experiments show that this is a promising approach to solve a broad class of decentralized decision problems. The general formulation used by the algorithm makes it both easy to implement and widely applicable to a variety of other AI problems. |
Szer, Daniel; Charpillet, Francois; Zilberstein, Shlomo MAA*: A Heuristic Search Algorithm for Solving Decentralized POMDPs Conference Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence (UAI), Edinburgh, Scotland, 2005. @conference{SZ:SCZuai05, We present multi-agent A* (MAA*), the first complete and optimal heuristic search algorithm for solving decentralized partially-observable Markov decision problems (DEC- POMDPs) with finite horizon. The algorithm is suitable for computing optimal plans for a cooperative group of agents that operate in a stochastic environment such as multi-robot coordination, network traffic control, or distributed resource allocation. Solving such problems effectively is a major challenge in the area of planning under uncertainty. Our solution is based on a synthesis of classical heuristic search and decentralized control theory. Experimental results show that MAA* has significant advantages. We introduce an anytime variant of MAA* and conclude with a discussion of promising extensions such as an approach to solving infinite-horizon problems. |
Allen, Martin; Goldman, Claudia V; Zilberstein, Shlomo Language Learning in Multi-Agent Systems Conference Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI), Edinburgh, Scotland, 2005. @conference{SZ:AGZijcai05, We present the problem of learning to communicate in decentralized and stochastic environments, analyzing it formally in a decision-theoretic context and illustrating the concept experimentally. Our approach allows agents to converge upon coordinated communication and action over time. |
Bernstein, Daniel S; Hansen, Eric A; Zilberstein, Shlomo Bounded Policy Iteration for Decentralized POMDPs Conference Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI), Edinburgh, Scotland, 2005. @conference{SZ:BHZijcai05, We present a bounded policy iteration algorithm for infinite-horizon decentralized POMDPs. Policies are represented as joint stochastic finite-state controllers, which consist of a local controller for each agent. We also let a joint controller include a correlation device that allows the agents to correlate their behavior without exchanging information during execution, and show that this leads to improved performance. The algorithm uses a fixed amount of memory, and each iteration is guaranteed to produce a controller with value at least as high as the previous one for all possible initial state distributions. For the case of a single agent, the algorithm reduces to Poupart and Boutilier's bounded policy iteration for POMDPs. |
Scalable Algorithms for Probabilistic Reasoning
Mahmud, Saaduddin; Nashed, Samer B.; Goldman, Claudia V.; Zilberstein, Shlomo Estimating Causal Responsibility for Explaining Autonomous Behavior Book Section In: Calvaresi, Davide (Ed.): International Workshop on Explainable and Transparent AI and Multi-Agent Systems (EXTRAAMAS), pp. 78–94, Springer, 2023. @incollection{SZ:MNGZextraamas23, There has been growing interest in causal explanations of stochastic, sequential decision-making systems. Structural causal models and causal reasoning offer several theoretical benefits when exact inference can be applied. Furthermore, users overwhelmingly prefer the resulting causal explanations over other state-of-the-art systems. In this work, we focus on one such method, MeanRESP, and its approximate versions that drastically reduce compute load and assign a responsibility score to each variable, which helps identify smaller sets of causes to be used as explanations. However, this method, and its approximate versions in particular, lack deeper theoretical analysis and broader empirical tests. To address these shortcomings, we provide three primary contributions. First, we offer several theoretical insights on the sample complexity and error rate of approximate MeanRESP. Second, we discuss several automated metrics for comparing explanations generated from approximate methods to those generated via exact methods. While we recognize the significance of user studies as the gold standard for evaluating explanations, our aim is to leverage the proposed metrics to systematically compare explanation-generation methods along important quantitative dimensions. Finally, we provide a more detailed discussion of MeanRESP and how its output under different definitions of responsibility compares to existing widely adopted methods that use Shapley values. |
Parr, Shane; Khatri, Ishan; Svegliato, Justin; Zilberstein, Shlomo Agent-Aware State Estimation in Autonomous Vehicles Conference Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 2021. @conference{SZ:PKSZiros21, Autonomous systems often operate in environments where the behavior of multiple agents is coordinated by a shared global state. Reliable estimation of the global state is thus critical for successfully operating in a multi-agent setting. We introduce agent-aware state estimation--a framework for calculating indirect estimations of state given observations of the behavior of other agents in the environment. We also introduce transition-independent agent-aware state estimation--a tractable class of agent-aware state estimation--and show that it allows the speed of inference to scale linearly with the number of agents in the environment. As an example, we model traffic light classification in instances of complete loss of direct observation. By taking into account observations of vehicular behavior from multiple directions of traffic, our approach exhibits accuracy higher than that of existing traffic light-only HMM methods on a real-world autonomous vehicle data set under a variety of simulated occlusion scenarios. |
Wu, Feng; Zilberstein, Shlomo; Jennings, Nicholas R Multi-Agent Planning with High-Level Human Guidance Conference Proceedings of Principles and Practice of Multi-Agent Systems (PRIMA), 2020. @conference{SZ:WZJprima20, Planning and coordination of multiple agents in the presence of uncertainty and noisy sensors is extremely hard. A human operator who observes a multi-agent team can provide valuable guidance to the team based on her superior ability to interpret observations and assess the overall situation. We propose an extension of decentralized POMDPs that allows such human guidance to be factored into the planning and execution processes. Human guidance in our framework consists of intuitive high-level commands that the agents must translate into a suitable joint plan that is sensitive to what they know from local observations. The result is a framework that allows multi-agent systems to benefit from the complex strategic thinking of a human supervising them. We evaluate this approach on several common benchmark problems and show that it can lead to dramatic improvement in performance. |
Wu, Feng; Zilberstein, Shlomo; Jennings, Nicholas R Stochastic Multi-agent Planning with Partial State Models Conference Proceedings of the First International Conference on Distributed Artificial Intelligence (DAI), Beijing, China, 2019. @conference{SZ:WZJdai19, People who observe a multi-agent team can often provide valuable information to the agents based on their superior cognitive abilities to interpret sequences of observations and assess the overall situation. The knowledge they possess is often difficult to be fully represented using a formal model such as DEC-POMDP. To deal with this, we propose an extension of the DEC-POMDP that allows states to be partially specified and benefit from expert knowledge, while preserving the partial observability and decentralized operation of the agents. In particular, we present an algorithm for computing policies based on history samples that include human labeled data in the form of reward reshaping. We also consider ways to minimize the burden on human experts during the labeling phase. The results offer the first approach to incorporating human knowledge in such complex multi-agent settings. We demonstrate the benefits of our approach using a disaster recovery scenario, comparing it to several baseline approaches. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Privacy-Preserving Policy Iteration for Decentralized POMDPs Conference Proceedings of the 32nd Conference on Artificial Intelligence (AAAI), New Orleans, Louisiana, 2018. @conference{SZ:WZCaaai18, We propose the first privacy-preserving approach to address the privacy issues that arise in multi-agent planning problems modeled as a Dec-POMDP. Our solution is a distributed message-passing algorithm based on trials, where the agents' policies are optimized using the cross-entropy method. In our algorithm, the agents' private information is protected using a public-key homomorphic cryptosystem. We prove the correctness of our algorithm and analyze its complexity in terms of message passing and encryption/decryption operations. Furthermore, we analyze several privacy aspects of our algorithm and show that it can preserve the agent privacy of non-neighbors, model privacy, and decision privacy. Our experimental results on several common Dec-POMDP bench- mark problems confirm the effectiveness of our approach. |
Wray, Kyle Hollins; Kumar, Akshat; Zilberstein, Shlomo Integrated Cooperation and Competition in Multi-Agent Decision-Making Conference Proceedings of the 32nd Conference on Artificial Intelligence (AAAI), New Orleans, Louisiana, 2018. @conference{SZ:WKZaaai18, Observing that many real-world sequential decision problems are not purely cooperative or purely competitive, we propose a new model--cooperative-competitive process (CCP)--that can simultaneously encapsulate both cooperation and competition. First, we discuss how the CCP model bridges the gap between cooperative and competitive models. Next, we investigate a specific class of group-dominant CCPs, in which agents cooperate to achieve a common goal as their primary objective, while also pursuing individual goals as a secondary objective. We provide an approximate solution for this class of problems that leverages stochastic finite-state controllers. The model is grounded in two multi-robot meeting and box-pushing domains that are implemented in simulation and demonstrated on two real robots. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Multi-Agent Planning with Baseline Regret Minimization Conference Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI), 2017. @conference{SZ:WZCijcai17, We propose a novel baseline regret minimization algorithm for multi-agent planning problems modeled as finite-horizon decentralized POMDPs. It guarantees to produce a policy that is provably at least as good as a given baseline policy. We also propose an iterative belief generation algorithm to efficiently minimize the baseline regret, which only requires necessary iterations so as to converge to the policy with minimum baseline regret. Experimental results on common benchmark problems confirm the benefits of the algorithm compared with the state-of-the-art approaches. |
Kumar, Akshat; Mostafa, Hala; Zilberstein, Shlomo Dual Formulations for Optimizing Dec-POMDP Controllers Conference Proceedings of the 26th International Conference on Automated Planning and Scheduling (ICAPS), London, UK, 2016. @conference{SZ:KMZicaps16, Decentralized POMDP is an expressive model for multiagent planning. Finite-state controllers (FSCs)--often used to represent policies for infinite-horizon problems---offer a compact, simple-to-execute policy representation. We exploit novel connections between optimizing decentralized FSCs and the dual linear program for MDPs. Consequently, we describe a dual mixed integer linear program (MIP) for optimizing deterministic FSCs. We exploit the Dec-POMDP structure to devise a compact MIP and formulate constraints that result in policies executable in partially-observable decentralized settings. We show analytically that the dual formulation can also be exploited within the expectation maximization (EM) framework to optimize stochastic FSCs. The resulting EM algorithm can be implemented by solving a sequence of linear programs, without requiring expensive message passing over the Dec-POMDP DBN. We also present an efficient technique for policy improvement based on a weighted entropy measure. Compared with state-of-the-art FSC methods, our approach offers over an order-of-magnitude speedup, while producing similar or better solutions. |
Kumar, Akshat; Zilberstein, Shlomo; Toussaint, Marc Probabilistic Inference Techniques for Scalable Multiagent Decision Making Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 53, pp. 223–270, 2015. @article{SZ:KZTjair15, Decentralized POMDPs provide an expressive framework for multiagent sequential decision making. However, the complexity of these models -- NEXP-Complete even for two agents -- has limited their scalability. We present a promising new class of approximation algorithms by developing novel connections between multiagent planning and machine learning. We show how the multiagent planning problem can be reformulated as inference in a mixture of dynamic Bayesian networks (DBNs). This planning-as-inference approach paves the way for the application of efficient inference techniques in DBNs to multiagent decision making. To further improve scalability, we identify certain conditions that are sufficient to extend the approach to multiagent systems with dozens of agents. Specifically, we show that the necessary inference within the expectation-maximization framework can be decomposed into processes that often involve a small subset of agents, thereby facilitating scalability. We further show that a number of existing multiagent planning models satisfy these conditions. Experiments on large planning benchmarks confirm the benefits of our approach in terms of runtime and scalability with respect to existing techniques. |
Nguyen, Duc Thien; Yeoh, William; Lau, Hoong Chuin; Zilberstein, Shlomo; Zhang, Chongjie Decentralized Multi-Agent Reinforcement Learning in Average-Reward Dynamic DCOPs Conference Proceedings of the 28th Conference on Artificial Intelligence (AAAI), Quebec City, Canada, 2014. @conference{SZ:NYLZZaaai14, Researchers have introduced the Dynamic Distributed Constraint Optimization Problem (Dynamic DCOP) formulation to model dynamically changing multi-agent coordination problems, where a dynamic DCOP is a sequence of (static canonical) DCOPs, each partially different from the DCOP preceding it. Existing work typically assumes that the problem in each time step is decoupled from the problems in other time steps, which might not hold in some applications. Therefore, in this paper, we make the following contributions: (i) We introduce a new model, called Markovian Dynamic DCOPs (MD-DCOPs), where the DCOP in the next time step is a function of the value assignments in the current time step; (ii) We introduce two distributed reinforcement learning algorithms, the Distributed RVI Q-learning algorithm and the Distributed R-learning algorithm, that balance exploration and exploitation to solve MD-DCOPs in an online manner; and (iii) We empirically evaluate them against an existing multiarm bandit DCOP algorithm on dynamic DCOPs. |
Brafman, Ronen I; Shani, Guy; Zilberstein, Shlomo Qualitative Planning under Partial Observability in Multi-Agent Domains Conference Proceedings of the 27th Conference on Artificial Intelligence (AAAI), Bellevue, Washington, 2013. @conference{SZ:BSZaaai13, Decentralized POMDPs (Dec-POMDPs) provide a rich, attractive model for planning under uncertainty and partial observability in cooperative multi-agent domains with a growing body of research. In this paper we formulate a qualitative, propositional model for multi-agent planning under uncertainty with partial observability, which we call Qualitative Dec-POMDP (QDec-POMDP). We show that the worst-case complexity of planning in QDec-POMDPs is similar to that of Dec-POMDPs. Still, because the model is more "classical" in nature, it is more compact and easier to specify. Furthermore, it eases the adaptation of methods used in classical and contingent planning to solve problems that challenge current Dec-POMDPs solvers. In particular, in this paper we describe a method based on compilation to classical planning, which handles multi-agent planning problems significantly larger than those handled by current Dec-POMDP algorithms. |
Wu, Feng; Zilberstein, Shlomo; Jennings, Nicholas R Monte-Carlo Expectation Maximization for Decentralized POMDPs Conference Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI), Beijing, China, 2013. @conference{SZ:WZJijcai13, We address two significant drawbacks of state-of-the-art solvers of decentralized POMDPs (DEC-POMDPs): the reliance on complete knowledge of the model and limited scalability as the complexity of the domain grows. We extend a recently proposed approach for solving DEC-POMDPs via a reduction to the maximum likelihood problem, which in turn can be solved using EM. We introduce a model-free version of this approach that employs Monte-Carlo EM (MCEM). While a naive implementation of MCEM is inadequate in multi-agent settings, we introduce several improvements in sampling that produce high-quality results on a variety of DEC-POMDP benchmarks, including large problems with thousands of agents. |
Yeoh, William; Kumar, Akshat; Zilberstein, Shlomo Automated Generation of Interaction Graphs for Value-Factored Dec-POMDPs Conference Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI), Beijing, China, 2013. @conference{SZ:YKZijcai13, The Decentralized Partially Observable Markov Decision Process (Dec-POMDP) is a powerful model for multiagent planning under uncertainty, but its applicability is hindered by its high complexity -- solving Dec-POMDPs optimally is NEXP-hard. Recently, Kumar et al. introduced the Value Factorization (VF) framework, which exploits decomposable value functions that can be factored into subfunctions. This framework has been shown to be a generalization of several models that leverage sparse agent interactions such as TI-Dec-MDPs, ND-POMDPs and TD-POMDPs. Existing algorithms for these models assume that the interaction graph of the problem is given. In this paper, we introduce three algorithms to automatically generate interaction graphs for models within the VF framework and establish lower and upper bounds on the expected reward of an optimal joint policy. We illustrate experimentally the benefits of these techniques for sensor placement in a decentralized tracking application. |
Durfee, Edmund; Zilberstein, Shlomo Multiagent Planning, Control, and Execution Book Section In: Weiss, G (Ed.): Multiagent Systems, Second Edition, pp. 485–546, MIT Press, Cambridge, MA, USA, 2013. @incollection{SZ:DZmultiagent13, |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Online Planning for Multi-Agent Systems with Bounded Communication Journal Article In: Artificial Intelligence (AIJ), vol. 175, no. 2, pp. 487–511, 2011. @article{SZ:WZCaij11, We propose an online algorithm for planning under uncertainty in multi-agent settings modeled as DEC-POMDPs. The algorithm helps overcome the high computational complexity of solving such problems offline. The key challenges in decentralized operation are to maintain coordinated behavior with little or no communication and, when communication is allowed, to optimize value with minimal communication. The algorithm addresses these challenges by generating identical conditional plans based on common knowledge and communicating only when history inconsistency is detected, allowing communication to be postponed when necessary. To be suitable for online operation, the algorithm computes good local policies using a new and fast local search method implemented using linear programming. Moreover, it bounds the amount of memory used at each step and can be applied to problems with arbitrary horizons. The experimental results confirm that the algorithm can solve problems that are too large for the best existing offline planning algorithms and it outperforms the best online method, producing much higher value with much less communication in most cases. The algorithm also proves to be effective when the communication channel is imperfect (periodically unavailable). These results contribute to the scalability of decision-theoretic planning in multi-agent settings. |
Kumar, Akshat; Zilberstein, Shlomo Message-Passing Algorithms for Large Structured Decentralized POMDPs Conference Proceedings of the 10th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Taipei, Taiwan, 2011. @conference{SZ:KZaamas11, Anytime algorithms allow a system to trade solution quality for computation time. In previous work, monitoring techniques have been developed to allow agents to stop the computation at the "right" time so as to optimize a given time-dependent utility function. However, these results apply only to the single-agent case. In this paper we analyze the problems that arise when several agents solve components of a larger problem, each using an anytime algorithm. Monitoring in this case is more challenging as each agent is uncertain about the progress made so far by the others. We develop a formal framework for decentralized monitoring, establish the complexity of several interesting variants of the problem, and propose solution techniques for each one. Finally, we show that the framework can be applied to decentralized flow and planning problems. |
Kumar, Akshat; Zilberstein, Shlomo; Toussaint, Marc Scalable Multiagent Planning Using Probabilistic Inference Conference Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI), Barcelona, Spain, 2011. @conference{SZ:KZTijcai11, Multiagent planning has seen much progress with the development of formal models such as Dec-POMDPs. However, the complexity of these models -- NEXP-Complete even for two agents -- has limited scalability. We identify certain mild conditions that are sufficient to make multiagent planning amenable to a scalable approximation w.r.t. the number of agents. This is achieved by constructing a graphical model in which likelihood maximization is equivalent to plan optimization. Using the Expectation-Maximization framework for likelihood maximization, we show that the necessary inference can be decomposed into processes that often involve a small subset of agents, thereby facilitating scalability. We derive a global update rule that combines these local inferences to monotonically increase the overall solution quality. Experiments on a large multiagent planning benchmark confirm the benefits of the new approach in terms of runtime and scalability. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Online Planning for Ad Hoc Autonomous Agent Teams Conference Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI), Barcelona, Spain, 2011. @conference{SZ:WZCijcai11, We propose a novel online planning algorithm for ad hoc team settings -- challenging situations in which an agent must collaborate with unknown teammates without prior coordination. Our approach is based on constructing and solving a series of stage games, and then using biased adaptive play to choose actions. The utility function in each stage game is estimated via Monte-Carlo tree search using the UCT algorithm. We establish analytically the convergence of the algorithm and show that it performs well in a variety of ad hoc team domains. |
Amato, Christopher; Bernstein, Daniel S; Zilberstein, Shlomo Optimizing Fixed-Size Stochastic Controllers for POMDPs and Decentralized POMDPs Journal Article In: Autonomous Agents and Multi-Agent Systems (JAAMAS), vol. 21, no. 3, pp. 293–320, 2010. @article{SZ:ABZjaamas10, Coordination of distributed agents is required for problems arising in many areas, including multi-robot systems, networking and e-commerce. As a formal framework for such problems, we use the decentralized partially observable Markov decision process (DEC-POMDP). Though much work has been done on optimal dynamic programming algorithms for the single-agent version of the problem, optimal algorithms for the multiagent case have been elusive. The main contribution of this paper is an optimal policy iteration algorithm for solving DEC-POMDPs. The algorithm uses stochastic finite-state controllers to represent policies. The solution can include a correlation device, which allows agents to correlate their actions without communicating. This approach alternates between expanding the controller and performing value-preserving transformations, which modify the controller without sacrificing value. We present two Efficient value-preserving transformations: one can reduce the size of the controller and the other can improve its value while keeping the size fixed. Empirical results demonstrate the usefulness of value-preserving transformations in increasing value while keeping controller size to a minimum. To broaden the applicability of the approach, we also present a heuristic version of the policy iteration algorithm, which sacrifices convergence to optimality. This algorithm further reduces the size of the controllers at each step by assuming that probability distributions over the other agents' actions are known. While this assumption may not hold in general, it helps produce higher quality solutions in our test problems. |
Kumar, Akshat; Zilberstein, Shlomo Point-Based Backup for Decentralized POMDPs: Complexity and New Algorithms Conference Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Toronto, Canada, 2010. @conference{SZ:KZaamas10, Decentralized POMDPs provide an expressive framework for sequential multi-agent decision making. Despite their high complexity, there has been significant progress in scaling up existing algorithms, largely due to the use of point-based methods. Performing point-based backup is a fundamental operation in state-of-the-art algorithms. We show that even a single backup step in the multi-agent setting is NP-Complete. Despite this negative worst-case result, we present an efficient and scalable optimal algorithm as well as a principled approximation scheme. The optimal algorithm exploits recent advances in the weighted CSP literature to overcome the complexity of the backup operation. The polytime approximation scheme provides a constant factor approximation guarantee based on the number of belief points. In experiments on standard domains, the optimal approach provides significant speedup (up to 2 orders of magnitude) over the previous best optimal algorithm and is able to increase the number of belief points by more than a factor of 3. The approximation scheme also works well in practice, providing near-optimal solutions to the backup problem. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Point-Based Policy Generation for Decentralized POMDPs Conference Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Toronto, Canada, 2010. @conference{SZ:WZCaamas10, Memory-bounded techniques have shown great promise in solving complex multi-agent planning problems modeled as DEC-POMDPs. Much of the performance gains can be attributed to pruning techniques that alleviate the complexity of the exhaustive backup step of the original MBDP algorithm. Despite these improvements, state-of-the-art algorithms can still handle a relative small pool of candidate policies, which limits the quality of the solution in some benchmark problems. We present a new algorithm, Point-Based Policy Generation, which avoids altogether searching the entire joint policy space. The key observation is that the best joint policy for each reachable belief state can be constructed directly, instead of producing first a large set of candidates. We also provide an efficient approximate implementation of this operation. The experimental results show that our solution technique improves the performance significantly in terms of both runtime and solution quality. |
Kumar, Akshat; Zilberstein, Shlomo Anytime Planning for Decentralized POMDPs using Expectation Maximization Conference Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI), Catalina Island, California, 2010. @conference{SZ:KZuai10, Decentralized POMDPs provide an expressive framework for multi-agent sequential decision making. While finite-horizon DEC-POMDPs have enjoyed significant success, progress remains slow for the infinite-horizon case mainly due to the inherent complexity of optimizing stochastic controllers representing agent policies. We present a promising new class of algorithms for the infinite-horizon case, which recasts the optimization problem as inference in a mixture of DBNs. An attractive feature of this approach is the straightforward adoption of existing inference techniques in DBNs for solving DEC-POMDPs and supporting richer representations such as factored or continuous states and actions. We also derive the Expectation Maximization (EM) algorithm to optimize the joint policy represented as DBNs. Experiments on benchmark domains show that EM compares favorably against the state-of-the-art solvers. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Rollout Sampling Policy Iteration for Decentralized POMDPs Conference Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI), Catalina Island, California, 2010. @conference{SZ:WZCuai10, We present decentralized rollout sampling policy iteration (DecRSPI)--a new algorithm for multiagent decision problems formalized as DEC-POMDPs. DecRSPI is designed to improve scalability and tackle problems that lack an explicit model. The algorithm uses Monte-Carlo methods to generate a sample of reachable belief states. Then it computes a joint policy for each belief state based on the rollout estimations. A new policy representation allows us to represent solutions compactly. The key benefits of the algorithm are its linear time complexity over the number of agents, its bounded memory usage and good solution quality. It can solve larger problems that are intractable for existing planning algorithms. Experimental results confirm the effectiveness and scalability of the approach. |
Amato, Christopher; Bonet, Blai; Zilberstein, Shlomo Finite-State Controllers Based on Mealy Machines for Centralized and Decentralized POMDPs Conference Proceedings of the 24th Conference on Artificial Intelligence (AAAI), Atlanta, Georgia, 2010. @conference{SZ:ABZaaai10, Existing controller-based approaches for centralized and decentralized POMDPs are based on automata with output known as Moore machines. In this paper, we show that several advantages can be gained by utilizing another type of automata, the Mealy machine. Mealy machines are more powerful than Moore machines, provide a richer structure that can be exploited by solution methods, and can be easily incorporated into current controller-based approaches. To demonstrate this, we adapted some existing controller-based algorithms to use Mealy machines and obtained results on a set of benchmark domains. The Mealy-based approach always outperformed the Moore-based approach and often outperformed the state-of-the-art algorithms for both centralized and decentralized POMDPs. These findings provide fresh and general insights for the improvement of existing algorithms and the development of new ones. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Trial-Based Dynamic Programming for Multi-Agent Planning Conference Proceedings of the 24th Conference on Artificial Intelligence (AAAI), Atlanta, Georgia, 2010. @conference{SZ:WZCaaai10, Trial-based approaches offer an efficient way to solve single-agent MDPs and POMDPs. These approaches allow agents to focus their computations on regions of the environment they encounter during the trials, leading to significant computational savings. We present a novel trial-based dynamic programming (TBDP) algorithm for DEC-POMDPs that extends these benefits to multi-agent settings. The algorithm uses trial-based methods for both belief generation and policy evaluation. Policy improvement is implemented efficiently using linear programming and a sub-policy reuse technique that helps bound the amount of memory. The results show that TBDP can produce significant value improvements and is much faster than the best existing planning algorithms. |
Bernstein, Daniel S; Amato, Christopher; Hansen, Eric A; Zilberstein, Shlomo Policy Iteration for Decentralized Control of Markov Decision Processes Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 34, pp. 89–132, 2009. @article{SZ:BAHZjair09, Coordination of distributed agents is required for problems arising in many areas, including multi-robot systems, networking and e-commerce. As a formal framework for such problems, we use the decentralized partially observable Markov decision process (DEC-POMDP). Though much work has been done on optimal dynamic programming algorithms for the single-agent version of the problem, optimal algorithms for the multiagent case have been elusive. The main contribution of this paper is an optimal policy iteration algorithm for solving DEC-POMDPs. The algorithm uses stochastic finite-state controllers to represent policies. The solution can include a correlation device, which allows agents to correlate their actions without communicating. This approach alternates between expanding the controller and performing value-preserving transformations, which modify the controller without sacrificing value. We present two Efficient value-preserving transformations: one can reduce the size of the controller and the other can improve its value while keeping the size fixed. Empirical results demonstrate the usefulness of value-preserving transformations in increasing value while keeping controller size to a minimum. To broaden the applicability of the approach, we also present a heuristic version of the policy iteration algorithm, which sacrifices convergence to optimality. This algorithm further reduces the size of the controllers at each step by assuming that probability distributions over the other agents' actions are known. While this assumption may not hold in general, it helps produce higher quality solutions in our test problems. |
Petrik, Marek; Zilberstein, Shlomo A Bilinear Programming Approach for Multiagent Planning Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 35, pp. 235–274, 2009. @article{SZ:PZjair09, Multiagent planning and coordination problems are common and known to be computationally hard. We show that a wide range of two-agent problems can be formulated as bilinear programs. We present a successive approximation algorithm that significantly outperforms the coverage set algorithm, which is the state-of-the-art method for this class of multiagent problems. Because the algorithm is formulated for bilinear programs, it is more general and simpler to implement. The new algorithm can be terminated at any time and--unlike the coverage set algorithm--it facilitates the derivation of a useful online performance bound. It is also much more efficient, on average reducing the computation time of the optimal solution by about four orders of magnitude. Finally, we introduce an automatic dimensionality reduction method that improves the effectiveness of the algorithm, extending its applicability to new domains and providing a new way to analyze a subclass of bilinear programs. |
Becker, Raphen; Carlin, Alan; Lesser, Victor; Zilberstein, Shlomo Analyzing Myopic Approaches for Multi-Agent Communication Journal Article In: Computational Intelligence, vol. 25, no. 1, pp. 31–50, 2009. @article{SZ:BCLZci09, Choosing when to communicate is a fundamental problem in multi-agent systems. This problem becomes particularly challenging when communication is constrained and each agent has different partial information about the overall situation. We take a decision-theoretic approach to this problem that balances the benefits of communication against the costs. Although computing the exact value of communication is intractable, it can be estimated using a standard myopic assumption--that communication is only possible at the present time. We examine specific situations in which this assumption leads to poor performance and demonstrate an alternative approach that relaxes the assumption and improves performance. The results provide an effective method for value-driven communication policies in multi-agent systems. |
Amato, Christopher; Zilberstein, Shlomo Achieving Goals in Decentralized POMDPs Conference Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Budapest, Hungary, 2009. @conference{ASZ:Zaamas09, Coordination of multiple agents under uncertainty in the decentralized POMDP model is known to be NEXP-complete, even when the agents have a joint set of goals. Nevertheless, we show that the existence of goals can help develop effective planning algorithms. We examine an approach to model these problems as indefinite-horizon decentralized POMDPs, suitable for many practical problems that terminate after some unspecified number of steps. Our algorithm for solving these problems is optimal under some common assumptions--that terminal actions exist for each agent and rewards for non-terminal actions are negative. We also propose an infinite-horizon approximation method that allows us to relax these assumptions while maintaining goal conditions. An optimality bound is developed for this sample-based approach and experimental results show that it is able to exploit the goal structure effectively. Compared with the state-of-the-art, our approach can solve larger problems and produce significantly better solutions. |
Kumar, Akshat; Zilberstein, Shlomo Constraint-Based Dynamic Programming for Decentralized POMDPs with Structured Interactions Conference Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Budapest, Hungary, 2009. @conference{SZ:KZaamas09, Decentralized partially observable MDPs (DEC-POMDPs) provide a rich framework for modeling decision making by a team of agents. Despite rapid progress in this area, the limited scalability of solution techniques has restricted the applicability of the model. To overcome this computational barrier, research has focused on restricted classes of DEC-POMDPs, which are easier to solve yet rich enough to capture many practical problems. We present CBDP, an efficient and scalable point-based dynamic programming algorithm for one such model called ND-POMDP (Network Distributed POMDP). Specifically, CBDP provides magnitudes of speedup in the policy computation and generates better quality solution for all test instances. It has linear complexity in the number of agents and horizon length. Furthermore, the complexity per horizon for the examined class of problems is exponential only in a small parameter that depends upon the interaction among the agents, achieving significant scalability for large, loosely coupled multi-agent systems. The efficiency of CBDP lies in exploiting the structure of interactions using constraint networks. These results extend significantly the effectiveness of decision-theoretic planning in multi-agent settings. |
Kumar, Akshat; Zilberstein, Shlomo Dynamic Programming Approximations for Partially Observable Stochastic Games Conference Proceedings of the 22nd International FLAIRS Conference, Sanibel Island, Florida, 2009. @conference{SZ:KZflairs09, Partially observable stochastic games (POSGs) provide a rich mathematical framework for planning under uncertainty by a group of agents. However, this modeling advantage comes with a price, namely a high computational cost. Solving POSGs optimally quickly becomes intractable after a few decision cycles. Our main contribution is to provide bounded approximation techniques, which enable us to scale POSG algorithms by several orders of magnitude. We study both the POSG model and its cooperative counterpart, DEC-POMDP. Experiments on a number of problems confirm the scalability of our approach while still providing useful policies. |
Kumar, Akshat; Zilberstein, Shlomo Event-Detecting Multi-Agent MDPs: Complexity and Constant-Factor Approximation Conference Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI), Pasadena, California, 2009. @conference{SZ:KZijcai09, Planning under uncertainty for multiple agents has grown rapidly with the development of formal models such as multi-agent MDPs and decentralized MDPs. But despite their richness, the applicability of these models remains limited due to their computational complexity. We present the class of event-detecting multi-agent MDPs (eMMDPs), designed to detect multiple mobile targets by a team of sensor agents. We show that eMMDPs are NP-Hard and present a scalable 2-approximation algorithm for solving them using matroid theory and constraint optimization. The complexity of the algorithm is linear in the state-space and number of agents, quadratic in the horizon, and exponential only in a small parameter that depends on the interaction among the agents. Despite the worst-case approximation ratio of 2, experimental results show that the algorithm produces near-optimal policies for a range of test problems. |
Amato, Christopher; Dibangoye, Jilles Steeve; Zilberstein, Shlomo Incremental Policy Generation for Finite-Horizon DEC-POMDPs Conference Proceedings of the 19th International Conference on Automated Planning and Scheduling (ICAPS), Thessaloniki, Greece, 2009. @conference{SZ:ADZicaps09, Decentralized partially observable MDPs (DEC-POMDPs) provide a rich framework for modeling decision making by a team of agents. Despite rapid progress in this area, the limited scalability of solution techniques has restricted the applicability of the model. To overcome this computational barrier, research has focused on restricted classes of DEC-POMDPs, which are easier to solve yet rich enough to capture many practical problems. We present CBDP, an efficient and scalable point-based dynamic programming algorithm for one such model called ND-POMDP (Network Distributed POMDP). Specifically, CBDP provides magnitudes of speedup in the policy computation and generates better quality solution for all test instances. It has linear complexity in the number of agents and horizon length. Furthermore, the complexity per horizon for the examined class of problems is exponential only in a small parameter that depends upon the interaction among the agents, achieving significant scalability for large, loosely coupled multi-agent systems. The efficiency of CBDP lies in exploiting the structure of interactions using constraint networks. These results extend significantly the effectiveness of decision-theoretic planning in multi-agent settings. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Multi-Agent Online Planning with Communication Conference Proceedings of the 19th International Conference on Automated Planning and Scheduling (ICAPS), Thessaloniki, Greece, 2009. @conference{SZ:WZCicaps09, We propose an online algorithm for planning under uncertainty in multi-agent settings modeled as DEC-POMDPs. The algorithm helps overcome the high computational complexity of solving such problems off-line. The key challenge is to produce coordinated behavior using little or no communication. When communication is allowed but constrained, the challenge is to produce high value with minimal communication. The algorithm addresses these challenges by communicating only when history inconsistency is detected, allowing communication to be postponed if necessary. Moreover, it bounds the memory usage at each step and can be applied to problems with arbitrary horizons. The experimental results confirm that the algorithm can solve problems that are too large for the best existing off-line planning algorithms and it outperforms the best online method, producing higher value with much less communication in most cases. |
Allen, Martin; Zilberstein, Shlomo Complexity of Decentralized Control: Special Cases Conference Proceedings of the 23rd Neural Information Processing Systems Conference (NIPS), Vancouver, British Columbia, Canada, 2009. @conference{SZ:AZnips09, The worst-case complexity of general decentralized POMDPs, which are equivalent to partially observable stochastic games (POSGs) is very high, both for the cooperative and competitive cases. Some reductions in complexity have been achieved by exploiting independence relations in some models. We show that these results are somewhat limited: when these independence assumptions are relaxed in very small ways, complexity returns to that of the general case. |
Goldman, Claudia V; Zilberstein, Shlomo Communication-Based Decomposition Mechanisms for Decentralized MDPs Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 32, pp. 169–202, 2008. @article{SZ:GZjair08, Multi-agent planning in stochastic environments can be framed formally as a decentralized Markov decision problem. Many real-life distributed problems that arise in manufacturing, multi-robot coordination and information gathering scenarios can be formalized using this framework. However, finding the optimal solution in the general case is hard, limiting the applicability of recently developed algorithms. This paper provides a practical approach for solving decentralized control problems when communication among the decision makers is possible, but costly. We develop the notion of communication-based mechanism that allows us to decompose a decentralized MDP into multiple single-agent problems. In this framework, referred to as decentralized semi-Markov decision process with direct communication (Dec-SMDP-Com), agents operate separately between communications. We show that finding an optimal mechanism is equivalent to solving optimally a Dec-SMDP-Com. We also provide a heuristic search algorithm that converges on the optimal decomposition. Restricting the decomposition to some specific types of local behaviors reduces significantly the complexity of planning. In particular, we present a polynomial-time algorithm for the case in which individual agents perform goal-oriented behaviors between communications. The paper concludes with an additional tractable algorithm that enables the introduction of human knowledge, thereby reducing the overall problem to finding the best time to communicate. Empirical results show that these approaches provide good approximate solutions. |
Seuken, Sven; Zilberstein, Shlomo Formal Models and Algorithms for Decentralized Decision Making under Uncertainty Journal Article In: Autonomous Agents and Multi-Agent Systems (JAAMAS), vol. 17, no. 2, pp. 190–250, 2008. @article{SZ:SZjaamas08, Multi-agent planning in stochastic environments can be framed formally as a decentralized Markov decision problem. Many real-life distributed problems that arise in manufacturing, multi-robot coordination and information gathering scenarios can be formalized using this framework. However, finding the optimal solution in the general case is hard, limiting the applicability of recently developed algorithms. This paper provides a practical approach for solving decentralized control problems when communication among the decision makers is possible, but costly. We develop the notion of communication-based mechanism that allows us to decompose a decentralized MDP into multiple single-agent problems. In this framework, referred to as decentralized semi-Markov decision process with direct communication (Dec-SMDP-Com), agents operate separately between communications. We show that finding an optimal mechanism is equivalent to solving optimally a Dec-SMDP-Com. We also provide a heuristic search algorithm that converges on the optimal decomposition. Restricting the decomposition to some specific types of local behaviors reduces significantly the complexity of planning. In particular, we present a polynomial-time algorithm for the case in which individual agents perform goal-oriented behaviors between communications. The paper concludes with an additional tractable algorithm that enables the introduction of human knowledge, thereby reducing the overall problem to finding the best time to communicate. Empirical results show that these approaches provide good approximate solutions. |
Petrik, Marek; Zilberstein, Shlomo A Successive Approximation Algorithm for Coordination Problems Conference Proceedings of the 10th International Symposium on Artificial Intelligence and Mathematics (ISAIM), Ft. Lauderdale, Florida, 2008. @conference{SZ:PZisaim08, Developing scalable coordination algorithms for multi-agent systems is a hard computational challenge. One useful approach, demonstrated by the Coverage Set Algorithm (CSA), exploits structured interaction to produce significant computational gains. Empirically, CSA exhibits very good anytime performance, but an error bound on the results has not been established. We reformulate the algorithm and derive an online error bound for approximate solutions. Moreover, we propose an effective way to automatically reduce the complexity of the interaction. Our experiments show that this is a promising approach to solve a broad class of decentralized decision problems. The general formulation used by the algorithm makes it both easy to implement and widely applicable to a variety of other AI problems. |
Carlin, Alan; Zilberstein, Shlomo Value-Based Observation Compression for DEC-POMDPs Conference Proceedings of the 7th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Estoril, Portugal, 2008. @conference{SZ:CZaamas08, Representing agent policies compactly is essential for improving the scalability of multi-agent planning algorithms. In this paper, we focus on developing a pruning technique that allows us to merge certain observations within agent policies, while minimizing loss of value. This is particularly important for solving finite-horizon decentralized POMDPs, where agent policies are represented as trees, and where the size of policy trees grows exponentially with the number of observations. We introduce a value-based observation compression technique that prunes the least valuable observations while maintaining an error bound on the value lost as a result of pruning. We analyze the characteristics of this pruning strategy and show empirically that it is effective. Thus, we use compact policies to obtain significantly higher values compared with the best existing DEC-POMDP algorithm. |
Carlin, Alan; Zilberstein, Shlomo Observation Compression in DEC-POMDP Policy Trees Conference AAMAS Workshop on Multi-Agent Sequential Decision Making in Uncertain Domains (MSDM), Estoril, Portugal, 2008, (Best Paper Award). @conference{SZ:CZmsdm08, Representing agent policies compactly is essential for improving the scalability of multi-agent planning algorithms. In this paper, we focus on developing a pruning technique that allows us to merge certain observations from agent policies, while minimizing the loss of value. This is particularly important for solving finite-horizon decentralized POMDPs, where agent policies are represented as trees, and where the size of policy trees grows exponentially with the number of observations. We introduce a value-based observation compression technique that prunes the least valuable observations while maintaining an error bound on the value lost as a result of pruning. We analyze the characteristics of this pruning strategy and show empirically that it is effective. Thus, we use compact policies to obtain significantly higher values compared with the best existing DEC-POMDP algorithm. |
Amato, Christopher; Zilberstein, Shlomo What's Worth Memorizing: Attribute-based Planning for DEC-POMDPs Conference ICAPS Workshop on Multiagent Planning, Sydney, Australia, 2008. @conference{SZ:AZmasplan08, Current algorithms for decentralized partially observable Markov decision processes (DEC-POMDPs) require a large amount of memory to produce high quality plans. To combat this, existing methods optimize a set of finite-state controllers with an arbitrary amount of fixed memory. While this works well for some problems, in general, scalability and solution quality remain limited. As an alternative, we propose remembering some attributes that summarize key aspects of an agent's action and observation history. These attributes are often simple to determine, provide a well-motivated choice of controller size and focus the solution search on important components of agent histories. We show that for a range of DEC-POMDPs such attribute-based representation improves plan quality and scalability. |
Goldman, Claudia V; Allen, Martin; Zilberstein, Shlomo Learning to Communicate in a Decentralized Environment Journal Article In: Autonomous Agents and Multi-Agent Systems (JAAMAS), vol. 15, no. 1, pp. 47–90, 2007. @article{SZ:GAZjaamas07, Learning to communicate is an emerging challenge in AI research. It is known that agents interacting in decentralized, stochastic environments can benefit from exchanging information. Multi-agent planning generally assumes that agents share a common means of communication; however, in building robust distributed systems it is important to address potential miscoordination resulting from misinterpretation of messages exchanged. This paper lays foundations for studying this problem, examining its properties analytically and empirically in a decision-theoretic context. We establish a formal framework for the problem, and identify a collection of necessary and sufficient properties for decision problems that allow agents to employ probabilistic updating schemes in order to learn how to interpret what others are communicating. Solving the problem optimally is often intractable, but our approach enables agents using different languages to converge upon coordination over time. Our experimental work establishes how these methods perform when applied to problems of varying complexity. |
Seuken, Sven; Zilberstein, Shlomo Memory-Bounded Dynamic Programming for DEC-POMDPs Conference Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI), Hyderabad, India, 2007. @conference{SZ:SZijcai07, Decentralized decision making under uncertainty has been shown to be intractable when each agent has different partial information about the domain. Thus, improving the applicability and scalability of planning algorithms is an important challenge. We present the first memory-bounded dynamic programming algorithm for finite-horizon decentralized POMDPs. A set of heuristics is used to identify relevant points of the infinitely large belief space. Using these belief points, the algorithm successively selects the best joint policies for each horizon. The algorithm is extremely efficient, having linear time and space complexity with respect to the horizon length. Experimental results show that it can handle horizons that are multiple orders of magnitude larger than what was previously possible, while achieving the same or better solution quality. These results significantly increase the applicability of decentralized decision-making techniques. |
Amato, Christopher; Bernstein, Daniel S; Zilberstein, Shlomo Optimizing Memory-Bounded Controllers for Decentralized POMDPs Conference Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (UAI), Vancouver, British Columbia, 2007. @conference{SZ:ABZuai07, We present a memory-bounded optimization approach for solving infinite-horizon decentralized POMDPs. Policies for each agent are represented by stochastic finite state controllers. We formulate the problem of optimizing these policies as a nonlinear program, leveraging powerful existing nonlinear optimization techniques for solving the problem. While existing solvers only guarantee locally optimal solutions, we show that our formulation produces higher quality controllers than the state-of-the-art approach. We also incorporate a shared source of randomness in the form of a correlation device to further increase solution quality with only a limited increase in space and time. Our experimental results show that nonlinear optimization can be used to provide high quality, concise solutions to decentralized decision problems under uncertainty. |
Seuken, Sven; Zilberstein, Shlomo Improved Memory-Bounded Dynamic Programming for Decentralized POMDPs Conference Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (UAI), Vancouver, British Columbia, 2007. @conference{SZ:SZuai07, Memory-Bounded Dynamic Programming (MBDP) has proved extremely effective in solving decentralized POMDPs with large horizons. We generalize the algorithm and improve its scalability by reducing the complexity with respect to the number of observations from exponential to polynomial. We derive error bounds on solution quality with respect to this new approximation and analyze the convergence behavior. To evaluate the effectiveness of the improvements, we introduce a new, larger benchmark problem. Experimental results show that despite the high complexity of decentralized POMDPs, scalable solution techniques such as MBDP perform surprisingly well. |
Allen, Martin; Zilberstein, Shlomo Agent Influence as a Predictor of Difficulty for Decentralized Problem-Solving Conference Proceedings of the 22nd Conference on Artificial Intelligence (AAAI), Vancouver, British Columbia, 2007. @conference{SZ:AZaaai07, We study the effect of problem structure on the practical performance of optimal dynamic programming for decentralized decision problems. It is shown that restricting agent influence over problem dynamics can make the problem easier to solve. Experimental results establish that agent influence correlates with problem difficulty: as the gap between the influence of different agents grows, problems tend to become much easier to solve. The measure thus provides a general-purpose, automatic characterization of decentralized problems, identifying those for which optimal methods are more or less likely to work. Such a measure is also of possible use as a heuristic in the design of algorithms that create task decompositions and control hierarchies in order to simplify multiagent problems. |
Petrik, Marek; Zilberstein, Shlomo Anytime Coordination Using Separable Bilinear Programs Conference Proceedings of the 22nd Conference on Artificial Intelligence (AAAI), Vancouver, British Columbia, 2007. @conference{SZ:PZaaai07, Developing scalable coordination algorithms for multi-agent systems is a hard computational challenge. One useful approach, demonstrated by the Coverage Set Algorithm (CSA), exploits structured interaction to produce significant computational gains. Empirically, CSA exhibits very good anytime performance, but an error bound on the results has not been established. We reformulate the algorithm and derive both online and offline error bounds for approximate solutions. Moreover, we propose an effective way to automatically reduce the complexity of the interaction. Our experiments show that this is a promising approach to solve a broad class of decentralized decision problems. The general formulation used by the algorithm makes it both easy to implement and widely applicable to a variety of other AI problems. |
Szer, Daniel; Charpillet, Francois; Zilberstein, Shlomo MAA*: A Heuristic Search Algorithm for Solving Decentralized POMDPs Conference Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence (UAI), Edinburgh, Scotland, 2005. @conference{SZ:SCZuai05, We present multi-agent A* (MAA*), the first complete and optimal heuristic search algorithm for solving decentralized partially-observable Markov decision problems (DEC- POMDPs) with finite horizon. The algorithm is suitable for computing optimal plans for a cooperative group of agents that operate in a stochastic environment such as multi-robot coordination, network traffic control, or distributed resource allocation. Solving such problems effectively is a major challenge in the area of planning under uncertainty. Our solution is based on a synthesis of classical heuristic search and decentralized control theory. Experimental results show that MAA* has significant advantages. We introduce an anytime variant of MAA* and conclude with a discussion of promising extensions such as an approach to solving infinite-horizon problems. |
Allen, Martin; Goldman, Claudia V; Zilberstein, Shlomo Language Learning in Multi-Agent Systems Conference Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI), Edinburgh, Scotland, 2005. @conference{SZ:AGZijcai05, We present the problem of learning to communicate in decentralized and stochastic environments, analyzing it formally in a decision-theoretic context and illustrating the concept experimentally. Our approach allows agents to converge upon coordinated communication and action over time. |
Bernstein, Daniel S; Hansen, Eric A; Zilberstein, Shlomo Bounded Policy Iteration for Decentralized POMDPs Conference Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI), Edinburgh, Scotland, 2005. @conference{SZ:BHZijcai05, We present a bounded policy iteration algorithm for infinite-horizon decentralized POMDPs. Policies are represented as joint stochastic finite-state controllers, which consist of a local controller for each agent. We also let a joint controller include a correlation device that allows the agents to correlate their behavior without exchanging information during execution, and show that this leads to improved performance. The algorithm uses a fixed amount of memory, and each iteration is guaranteed to produce a controller with value at least as high as the previous one for all possible initial state distributions. For the case of a single agent, the algorithm reduces to Poupart and Boutilier's bounded policy iteration for POMDPs. |
Belief-Space Planning and POMDPs
Mahmud, Saaduddin; Nashed, Samer B.; Goldman, Claudia V.; Zilberstein, Shlomo Estimating Causal Responsibility for Explaining Autonomous Behavior Book Section In: Calvaresi, Davide (Ed.): International Workshop on Explainable and Transparent AI and Multi-Agent Systems (EXTRAAMAS), pp. 78–94, Springer, 2023. @incollection{SZ:MNGZextraamas23, There has been growing interest in causal explanations of stochastic, sequential decision-making systems. Structural causal models and causal reasoning offer several theoretical benefits when exact inference can be applied. Furthermore, users overwhelmingly prefer the resulting causal explanations over other state-of-the-art systems. In this work, we focus on one such method, MeanRESP, and its approximate versions that drastically reduce compute load and assign a responsibility score to each variable, which helps identify smaller sets of causes to be used as explanations. However, this method, and its approximate versions in particular, lack deeper theoretical analysis and broader empirical tests. To address these shortcomings, we provide three primary contributions. First, we offer several theoretical insights on the sample complexity and error rate of approximate MeanRESP. Second, we discuss several automated metrics for comparing explanations generated from approximate methods to those generated via exact methods. While we recognize the significance of user studies as the gold standard for evaluating explanations, our aim is to leverage the proposed metrics to systematically compare explanation-generation methods along important quantitative dimensions. Finally, we provide a more detailed discussion of MeanRESP and how its output under different definitions of responsibility compares to existing widely adopted methods that use Shapley values. |
Parr, Shane; Khatri, Ishan; Svegliato, Justin; Zilberstein, Shlomo Agent-Aware State Estimation in Autonomous Vehicles Conference Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 2021. @conference{SZ:PKSZiros21, Autonomous systems often operate in environments where the behavior of multiple agents is coordinated by a shared global state. Reliable estimation of the global state is thus critical for successfully operating in a multi-agent setting. We introduce agent-aware state estimation--a framework for calculating indirect estimations of state given observations of the behavior of other agents in the environment. We also introduce transition-independent agent-aware state estimation--a tractable class of agent-aware state estimation--and show that it allows the speed of inference to scale linearly with the number of agents in the environment. As an example, we model traffic light classification in instances of complete loss of direct observation. By taking into account observations of vehicular behavior from multiple directions of traffic, our approach exhibits accuracy higher than that of existing traffic light-only HMM methods on a real-world autonomous vehicle data set under a variety of simulated occlusion scenarios. |
Wu, Feng; Zilberstein, Shlomo; Jennings, Nicholas R Multi-Agent Planning with High-Level Human Guidance Conference Proceedings of Principles and Practice of Multi-Agent Systems (PRIMA), 2020. @conference{SZ:WZJprima20, Planning and coordination of multiple agents in the presence of uncertainty and noisy sensors is extremely hard. A human operator who observes a multi-agent team can provide valuable guidance to the team based on her superior ability to interpret observations and assess the overall situation. We propose an extension of decentralized POMDPs that allows such human guidance to be factored into the planning and execution processes. Human guidance in our framework consists of intuitive high-level commands that the agents must translate into a suitable joint plan that is sensitive to what they know from local observations. The result is a framework that allows multi-agent systems to benefit from the complex strategic thinking of a human supervising them. We evaluate this approach on several common benchmark problems and show that it can lead to dramatic improvement in performance. |
Wu, Feng; Zilberstein, Shlomo; Jennings, Nicholas R Stochastic Multi-agent Planning with Partial State Models Conference Proceedings of the First International Conference on Distributed Artificial Intelligence (DAI), Beijing, China, 2019. @conference{SZ:WZJdai19, People who observe a multi-agent team can often provide valuable information to the agents based on their superior cognitive abilities to interpret sequences of observations and assess the overall situation. The knowledge they possess is often difficult to be fully represented using a formal model such as DEC-POMDP. To deal with this, we propose an extension of the DEC-POMDP that allows states to be partially specified and benefit from expert knowledge, while preserving the partial observability and decentralized operation of the agents. In particular, we present an algorithm for computing policies based on history samples that include human labeled data in the form of reward reshaping. We also consider ways to minimize the burden on human experts during the labeling phase. The results offer the first approach to incorporating human knowledge in such complex multi-agent settings. We demonstrate the benefits of our approach using a disaster recovery scenario, comparing it to several baseline approaches. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Privacy-Preserving Policy Iteration for Decentralized POMDPs Conference Proceedings of the 32nd Conference on Artificial Intelligence (AAAI), New Orleans, Louisiana, 2018. @conference{SZ:WZCaaai18, We propose the first privacy-preserving approach to address the privacy issues that arise in multi-agent planning problems modeled as a Dec-POMDP. Our solution is a distributed message-passing algorithm based on trials, where the agents' policies are optimized using the cross-entropy method. In our algorithm, the agents' private information is protected using a public-key homomorphic cryptosystem. We prove the correctness of our algorithm and analyze its complexity in terms of message passing and encryption/decryption operations. Furthermore, we analyze several privacy aspects of our algorithm and show that it can preserve the agent privacy of non-neighbors, model privacy, and decision privacy. Our experimental results on several common Dec-POMDP bench- mark problems confirm the effectiveness of our approach. |
Wray, Kyle Hollins; Kumar, Akshat; Zilberstein, Shlomo Integrated Cooperation and Competition in Multi-Agent Decision-Making Conference Proceedings of the 32nd Conference on Artificial Intelligence (AAAI), New Orleans, Louisiana, 2018. @conference{SZ:WKZaaai18, Observing that many real-world sequential decision problems are not purely cooperative or purely competitive, we propose a new model--cooperative-competitive process (CCP)--that can simultaneously encapsulate both cooperation and competition. First, we discuss how the CCP model bridges the gap between cooperative and competitive models. Next, we investigate a specific class of group-dominant CCPs, in which agents cooperate to achieve a common goal as their primary objective, while also pursuing individual goals as a secondary objective. We provide an approximate solution for this class of problems that leverages stochastic finite-state controllers. The model is grounded in two multi-robot meeting and box-pushing domains that are implemented in simulation and demonstrated on two real robots. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Multi-Agent Planning with Baseline Regret Minimization Conference Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI), 2017. @conference{SZ:WZCijcai17, We propose a novel baseline regret minimization algorithm for multi-agent planning problems modeled as finite-horizon decentralized POMDPs. It guarantees to produce a policy that is provably at least as good as a given baseline policy. We also propose an iterative belief generation algorithm to efficiently minimize the baseline regret, which only requires necessary iterations so as to converge to the policy with minimum baseline regret. Experimental results on common benchmark problems confirm the benefits of the algorithm compared with the state-of-the-art approaches. |
Kumar, Akshat; Mostafa, Hala; Zilberstein, Shlomo Dual Formulations for Optimizing Dec-POMDP Controllers Conference Proceedings of the 26th International Conference on Automated Planning and Scheduling (ICAPS), London, UK, 2016. @conference{SZ:KMZicaps16, Decentralized POMDP is an expressive model for multiagent planning. Finite-state controllers (FSCs)--often used to represent policies for infinite-horizon problems---offer a compact, simple-to-execute policy representation. We exploit novel connections between optimizing decentralized FSCs and the dual linear program for MDPs. Consequently, we describe a dual mixed integer linear program (MIP) for optimizing deterministic FSCs. We exploit the Dec-POMDP structure to devise a compact MIP and formulate constraints that result in policies executable in partially-observable decentralized settings. We show analytically that the dual formulation can also be exploited within the expectation maximization (EM) framework to optimize stochastic FSCs. The resulting EM algorithm can be implemented by solving a sequence of linear programs, without requiring expensive message passing over the Dec-POMDP DBN. We also present an efficient technique for policy improvement based on a weighted entropy measure. Compared with state-of-the-art FSC methods, our approach offers over an order-of-magnitude speedup, while producing similar or better solutions. |
Kumar, Akshat; Zilberstein, Shlomo; Toussaint, Marc Probabilistic Inference Techniques for Scalable Multiagent Decision Making Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 53, pp. 223–270, 2015. @article{SZ:KZTjair15, Decentralized POMDPs provide an expressive framework for multiagent sequential decision making. However, the complexity of these models -- NEXP-Complete even for two agents -- has limited their scalability. We present a promising new class of approximation algorithms by developing novel connections between multiagent planning and machine learning. We show how the multiagent planning problem can be reformulated as inference in a mixture of dynamic Bayesian networks (DBNs). This planning-as-inference approach paves the way for the application of efficient inference techniques in DBNs to multiagent decision making. To further improve scalability, we identify certain conditions that are sufficient to extend the approach to multiagent systems with dozens of agents. Specifically, we show that the necessary inference within the expectation-maximization framework can be decomposed into processes that often involve a small subset of agents, thereby facilitating scalability. We further show that a number of existing multiagent planning models satisfy these conditions. Experiments on large planning benchmarks confirm the benefits of our approach in terms of runtime and scalability with respect to existing techniques. |
Nguyen, Duc Thien; Yeoh, William; Lau, Hoong Chuin; Zilberstein, Shlomo; Zhang, Chongjie Decentralized Multi-Agent Reinforcement Learning in Average-Reward Dynamic DCOPs Conference Proceedings of the 28th Conference on Artificial Intelligence (AAAI), Quebec City, Canada, 2014. @conference{SZ:NYLZZaaai14, Researchers have introduced the Dynamic Distributed Constraint Optimization Problem (Dynamic DCOP) formulation to model dynamically changing multi-agent coordination problems, where a dynamic DCOP is a sequence of (static canonical) DCOPs, each partially different from the DCOP preceding it. Existing work typically assumes that the problem in each time step is decoupled from the problems in other time steps, which might not hold in some applications. Therefore, in this paper, we make the following contributions: (i) We introduce a new model, called Markovian Dynamic DCOPs (MD-DCOPs), where the DCOP in the next time step is a function of the value assignments in the current time step; (ii) We introduce two distributed reinforcement learning algorithms, the Distributed RVI Q-learning algorithm and the Distributed R-learning algorithm, that balance exploration and exploitation to solve MD-DCOPs in an online manner; and (iii) We empirically evaluate them against an existing multiarm bandit DCOP algorithm on dynamic DCOPs. |
Brafman, Ronen I; Shani, Guy; Zilberstein, Shlomo Qualitative Planning under Partial Observability in Multi-Agent Domains Conference Proceedings of the 27th Conference on Artificial Intelligence (AAAI), Bellevue, Washington, 2013. @conference{SZ:BSZaaai13, Decentralized POMDPs (Dec-POMDPs) provide a rich, attractive model for planning under uncertainty and partial observability in cooperative multi-agent domains with a growing body of research. In this paper we formulate a qualitative, propositional model for multi-agent planning under uncertainty with partial observability, which we call Qualitative Dec-POMDP (QDec-POMDP). We show that the worst-case complexity of planning in QDec-POMDPs is similar to that of Dec-POMDPs. Still, because the model is more "classical" in nature, it is more compact and easier to specify. Furthermore, it eases the adaptation of methods used in classical and contingent planning to solve problems that challenge current Dec-POMDPs solvers. In particular, in this paper we describe a method based on compilation to classical planning, which handles multi-agent planning problems significantly larger than those handled by current Dec-POMDP algorithms. |
Wu, Feng; Zilberstein, Shlomo; Jennings, Nicholas R Monte-Carlo Expectation Maximization for Decentralized POMDPs Conference Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI), Beijing, China, 2013. @conference{SZ:WZJijcai13, We address two significant drawbacks of state-of-the-art solvers of decentralized POMDPs (DEC-POMDPs): the reliance on complete knowledge of the model and limited scalability as the complexity of the domain grows. We extend a recently proposed approach for solving DEC-POMDPs via a reduction to the maximum likelihood problem, which in turn can be solved using EM. We introduce a model-free version of this approach that employs Monte-Carlo EM (MCEM). While a naive implementation of MCEM is inadequate in multi-agent settings, we introduce several improvements in sampling that produce high-quality results on a variety of DEC-POMDP benchmarks, including large problems with thousands of agents. |
Yeoh, William; Kumar, Akshat; Zilberstein, Shlomo Automated Generation of Interaction Graphs for Value-Factored Dec-POMDPs Conference Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI), Beijing, China, 2013. @conference{SZ:YKZijcai13, The Decentralized Partially Observable Markov Decision Process (Dec-POMDP) is a powerful model for multiagent planning under uncertainty, but its applicability is hindered by its high complexity -- solving Dec-POMDPs optimally is NEXP-hard. Recently, Kumar et al. introduced the Value Factorization (VF) framework, which exploits decomposable value functions that can be factored into subfunctions. This framework has been shown to be a generalization of several models that leverage sparse agent interactions such as TI-Dec-MDPs, ND-POMDPs and TD-POMDPs. Existing algorithms for these models assume that the interaction graph of the problem is given. In this paper, we introduce three algorithms to automatically generate interaction graphs for models within the VF framework and establish lower and upper bounds on the expected reward of an optimal joint policy. We illustrate experimentally the benefits of these techniques for sensor placement in a decentralized tracking application. |
Durfee, Edmund; Zilberstein, Shlomo Multiagent Planning, Control, and Execution Book Section In: Weiss, G (Ed.): Multiagent Systems, Second Edition, pp. 485–546, MIT Press, Cambridge, MA, USA, 2013. @incollection{SZ:DZmultiagent13, |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Online Planning for Multi-Agent Systems with Bounded Communication Journal Article In: Artificial Intelligence (AIJ), vol. 175, no. 2, pp. 487–511, 2011. @article{SZ:WZCaij11, We propose an online algorithm for planning under uncertainty in multi-agent settings modeled as DEC-POMDPs. The algorithm helps overcome the high computational complexity of solving such problems offline. The key challenges in decentralized operation are to maintain coordinated behavior with little or no communication and, when communication is allowed, to optimize value with minimal communication. The algorithm addresses these challenges by generating identical conditional plans based on common knowledge and communicating only when history inconsistency is detected, allowing communication to be postponed when necessary. To be suitable for online operation, the algorithm computes good local policies using a new and fast local search method implemented using linear programming. Moreover, it bounds the amount of memory used at each step and can be applied to problems with arbitrary horizons. The experimental results confirm that the algorithm can solve problems that are too large for the best existing offline planning algorithms and it outperforms the best online method, producing much higher value with much less communication in most cases. The algorithm also proves to be effective when the communication channel is imperfect (periodically unavailable). These results contribute to the scalability of decision-theoretic planning in multi-agent settings. |
Kumar, Akshat; Zilberstein, Shlomo Message-Passing Algorithms for Large Structured Decentralized POMDPs Conference Proceedings of the 10th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Taipei, Taiwan, 2011. @conference{SZ:KZaamas11, Anytime algorithms allow a system to trade solution quality for computation time. In previous work, monitoring techniques have been developed to allow agents to stop the computation at the "right" time so as to optimize a given time-dependent utility function. However, these results apply only to the single-agent case. In this paper we analyze the problems that arise when several agents solve components of a larger problem, each using an anytime algorithm. Monitoring in this case is more challenging as each agent is uncertain about the progress made so far by the others. We develop a formal framework for decentralized monitoring, establish the complexity of several interesting variants of the problem, and propose solution techniques for each one. Finally, we show that the framework can be applied to decentralized flow and planning problems. |
Kumar, Akshat; Zilberstein, Shlomo; Toussaint, Marc Scalable Multiagent Planning Using Probabilistic Inference Conference Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI), Barcelona, Spain, 2011. @conference{SZ:KZTijcai11, Multiagent planning has seen much progress with the development of formal models such as Dec-POMDPs. However, the complexity of these models -- NEXP-Complete even for two agents -- has limited scalability. We identify certain mild conditions that are sufficient to make multiagent planning amenable to a scalable approximation w.r.t. the number of agents. This is achieved by constructing a graphical model in which likelihood maximization is equivalent to plan optimization. Using the Expectation-Maximization framework for likelihood maximization, we show that the necessary inference can be decomposed into processes that often involve a small subset of agents, thereby facilitating scalability. We derive a global update rule that combines these local inferences to monotonically increase the overall solution quality. Experiments on a large multiagent planning benchmark confirm the benefits of the new approach in terms of runtime and scalability. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Online Planning for Ad Hoc Autonomous Agent Teams Conference Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI), Barcelona, Spain, 2011. @conference{SZ:WZCijcai11, We propose a novel online planning algorithm for ad hoc team settings -- challenging situations in which an agent must collaborate with unknown teammates without prior coordination. Our approach is based on constructing and solving a series of stage games, and then using biased adaptive play to choose actions. The utility function in each stage game is estimated via Monte-Carlo tree search using the UCT algorithm. We establish analytically the convergence of the algorithm and show that it performs well in a variety of ad hoc team domains. |
Amato, Christopher; Bernstein, Daniel S; Zilberstein, Shlomo Optimizing Fixed-Size Stochastic Controllers for POMDPs and Decentralized POMDPs Journal Article In: Autonomous Agents and Multi-Agent Systems (JAAMAS), vol. 21, no. 3, pp. 293–320, 2010. @article{SZ:ABZjaamas10, Coordination of distributed agents is required for problems arising in many areas, including multi-robot systems, networking and e-commerce. As a formal framework for such problems, we use the decentralized partially observable Markov decision process (DEC-POMDP). Though much work has been done on optimal dynamic programming algorithms for the single-agent version of the problem, optimal algorithms for the multiagent case have been elusive. The main contribution of this paper is an optimal policy iteration algorithm for solving DEC-POMDPs. The algorithm uses stochastic finite-state controllers to represent policies. The solution can include a correlation device, which allows agents to correlate their actions without communicating. This approach alternates between expanding the controller and performing value-preserving transformations, which modify the controller without sacrificing value. We present two Efficient value-preserving transformations: one can reduce the size of the controller and the other can improve its value while keeping the size fixed. Empirical results demonstrate the usefulness of value-preserving transformations in increasing value while keeping controller size to a minimum. To broaden the applicability of the approach, we also present a heuristic version of the policy iteration algorithm, which sacrifices convergence to optimality. This algorithm further reduces the size of the controllers at each step by assuming that probability distributions over the other agents' actions are known. While this assumption may not hold in general, it helps produce higher quality solutions in our test problems. |
Kumar, Akshat; Zilberstein, Shlomo Point-Based Backup for Decentralized POMDPs: Complexity and New Algorithms Conference Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Toronto, Canada, 2010. @conference{SZ:KZaamas10, Decentralized POMDPs provide an expressive framework for sequential multi-agent decision making. Despite their high complexity, there has been significant progress in scaling up existing algorithms, largely due to the use of point-based methods. Performing point-based backup is a fundamental operation in state-of-the-art algorithms. We show that even a single backup step in the multi-agent setting is NP-Complete. Despite this negative worst-case result, we present an efficient and scalable optimal algorithm as well as a principled approximation scheme. The optimal algorithm exploits recent advances in the weighted CSP literature to overcome the complexity of the backup operation. The polytime approximation scheme provides a constant factor approximation guarantee based on the number of belief points. In experiments on standard domains, the optimal approach provides significant speedup (up to 2 orders of magnitude) over the previous best optimal algorithm and is able to increase the number of belief points by more than a factor of 3. The approximation scheme also works well in practice, providing near-optimal solutions to the backup problem. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Point-Based Policy Generation for Decentralized POMDPs Conference Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Toronto, Canada, 2010. @conference{SZ:WZCaamas10, Memory-bounded techniques have shown great promise in solving complex multi-agent planning problems modeled as DEC-POMDPs. Much of the performance gains can be attributed to pruning techniques that alleviate the complexity of the exhaustive backup step of the original MBDP algorithm. Despite these improvements, state-of-the-art algorithms can still handle a relative small pool of candidate policies, which limits the quality of the solution in some benchmark problems. We present a new algorithm, Point-Based Policy Generation, which avoids altogether searching the entire joint policy space. The key observation is that the best joint policy for each reachable belief state can be constructed directly, instead of producing first a large set of candidates. We also provide an efficient approximate implementation of this operation. The experimental results show that our solution technique improves the performance significantly in terms of both runtime and solution quality. |
Kumar, Akshat; Zilberstein, Shlomo Anytime Planning for Decentralized POMDPs using Expectation Maximization Conference Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI), Catalina Island, California, 2010. @conference{SZ:KZuai10, Decentralized POMDPs provide an expressive framework for multi-agent sequential decision making. While finite-horizon DEC-POMDPs have enjoyed significant success, progress remains slow for the infinite-horizon case mainly due to the inherent complexity of optimizing stochastic controllers representing agent policies. We present a promising new class of algorithms for the infinite-horizon case, which recasts the optimization problem as inference in a mixture of DBNs. An attractive feature of this approach is the straightforward adoption of existing inference techniques in DBNs for solving DEC-POMDPs and supporting richer representations such as factored or continuous states and actions. We also derive the Expectation Maximization (EM) algorithm to optimize the joint policy represented as DBNs. Experiments on benchmark domains show that EM compares favorably against the state-of-the-art solvers. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Rollout Sampling Policy Iteration for Decentralized POMDPs Conference Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI), Catalina Island, California, 2010. @conference{SZ:WZCuai10, We present decentralized rollout sampling policy iteration (DecRSPI)--a new algorithm for multiagent decision problems formalized as DEC-POMDPs. DecRSPI is designed to improve scalability and tackle problems that lack an explicit model. The algorithm uses Monte-Carlo methods to generate a sample of reachable belief states. Then it computes a joint policy for each belief state based on the rollout estimations. A new policy representation allows us to represent solutions compactly. The key benefits of the algorithm are its linear time complexity over the number of agents, its bounded memory usage and good solution quality. It can solve larger problems that are intractable for existing planning algorithms. Experimental results confirm the effectiveness and scalability of the approach. |
Amato, Christopher; Bonet, Blai; Zilberstein, Shlomo Finite-State Controllers Based on Mealy Machines for Centralized and Decentralized POMDPs Conference Proceedings of the 24th Conference on Artificial Intelligence (AAAI), Atlanta, Georgia, 2010. @conference{SZ:ABZaaai10, Existing controller-based approaches for centralized and decentralized POMDPs are based on automata with output known as Moore machines. In this paper, we show that several advantages can be gained by utilizing another type of automata, the Mealy machine. Mealy machines are more powerful than Moore machines, provide a richer structure that can be exploited by solution methods, and can be easily incorporated into current controller-based approaches. To demonstrate this, we adapted some existing controller-based algorithms to use Mealy machines and obtained results on a set of benchmark domains. The Mealy-based approach always outperformed the Moore-based approach and often outperformed the state-of-the-art algorithms for both centralized and decentralized POMDPs. These findings provide fresh and general insights for the improvement of existing algorithms and the development of new ones. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Trial-Based Dynamic Programming for Multi-Agent Planning Conference Proceedings of the 24th Conference on Artificial Intelligence (AAAI), Atlanta, Georgia, 2010. @conference{SZ:WZCaaai10, Trial-based approaches offer an efficient way to solve single-agent MDPs and POMDPs. These approaches allow agents to focus their computations on regions of the environment they encounter during the trials, leading to significant computational savings. We present a novel trial-based dynamic programming (TBDP) algorithm for DEC-POMDPs that extends these benefits to multi-agent settings. The algorithm uses trial-based methods for both belief generation and policy evaluation. Policy improvement is implemented efficiently using linear programming and a sub-policy reuse technique that helps bound the amount of memory. The results show that TBDP can produce significant value improvements and is much faster than the best existing planning algorithms. |
Bernstein, Daniel S; Amato, Christopher; Hansen, Eric A; Zilberstein, Shlomo Policy Iteration for Decentralized Control of Markov Decision Processes Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 34, pp. 89–132, 2009. @article{SZ:BAHZjair09, Coordination of distributed agents is required for problems arising in many areas, including multi-robot systems, networking and e-commerce. As a formal framework for such problems, we use the decentralized partially observable Markov decision process (DEC-POMDP). Though much work has been done on optimal dynamic programming algorithms for the single-agent version of the problem, optimal algorithms for the multiagent case have been elusive. The main contribution of this paper is an optimal policy iteration algorithm for solving DEC-POMDPs. The algorithm uses stochastic finite-state controllers to represent policies. The solution can include a correlation device, which allows agents to correlate their actions without communicating. This approach alternates between expanding the controller and performing value-preserving transformations, which modify the controller without sacrificing value. We present two Efficient value-preserving transformations: one can reduce the size of the controller and the other can improve its value while keeping the size fixed. Empirical results demonstrate the usefulness of value-preserving transformations in increasing value while keeping controller size to a minimum. To broaden the applicability of the approach, we also present a heuristic version of the policy iteration algorithm, which sacrifices convergence to optimality. This algorithm further reduces the size of the controllers at each step by assuming that probability distributions over the other agents' actions are known. While this assumption may not hold in general, it helps produce higher quality solutions in our test problems. |
Petrik, Marek; Zilberstein, Shlomo A Bilinear Programming Approach for Multiagent Planning Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 35, pp. 235–274, 2009. @article{SZ:PZjair09, Multiagent planning and coordination problems are common and known to be computationally hard. We show that a wide range of two-agent problems can be formulated as bilinear programs. We present a successive approximation algorithm that significantly outperforms the coverage set algorithm, which is the state-of-the-art method for this class of multiagent problems. Because the algorithm is formulated for bilinear programs, it is more general and simpler to implement. The new algorithm can be terminated at any time and--unlike the coverage set algorithm--it facilitates the derivation of a useful online performance bound. It is also much more efficient, on average reducing the computation time of the optimal solution by about four orders of magnitude. Finally, we introduce an automatic dimensionality reduction method that improves the effectiveness of the algorithm, extending its applicability to new domains and providing a new way to analyze a subclass of bilinear programs. |
Becker, Raphen; Carlin, Alan; Lesser, Victor; Zilberstein, Shlomo Analyzing Myopic Approaches for Multi-Agent Communication Journal Article In: Computational Intelligence, vol. 25, no. 1, pp. 31–50, 2009. @article{SZ:BCLZci09, Choosing when to communicate is a fundamental problem in multi-agent systems. This problem becomes particularly challenging when communication is constrained and each agent has different partial information about the overall situation. We take a decision-theoretic approach to this problem that balances the benefits of communication against the costs. Although computing the exact value of communication is intractable, it can be estimated using a standard myopic assumption--that communication is only possible at the present time. We examine specific situations in which this assumption leads to poor performance and demonstrate an alternative approach that relaxes the assumption and improves performance. The results provide an effective method for value-driven communication policies in multi-agent systems. |
Amato, Christopher; Zilberstein, Shlomo Achieving Goals in Decentralized POMDPs Conference Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Budapest, Hungary, 2009. @conference{ASZ:Zaamas09, Coordination of multiple agents under uncertainty in the decentralized POMDP model is known to be NEXP-complete, even when the agents have a joint set of goals. Nevertheless, we show that the existence of goals can help develop effective planning algorithms. We examine an approach to model these problems as indefinite-horizon decentralized POMDPs, suitable for many practical problems that terminate after some unspecified number of steps. Our algorithm for solving these problems is optimal under some common assumptions--that terminal actions exist for each agent and rewards for non-terminal actions are negative. We also propose an infinite-horizon approximation method that allows us to relax these assumptions while maintaining goal conditions. An optimality bound is developed for this sample-based approach and experimental results show that it is able to exploit the goal structure effectively. Compared with the state-of-the-art, our approach can solve larger problems and produce significantly better solutions. |
Kumar, Akshat; Zilberstein, Shlomo Constraint-Based Dynamic Programming for Decentralized POMDPs with Structured Interactions Conference Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Budapest, Hungary, 2009. @conference{SZ:KZaamas09, Decentralized partially observable MDPs (DEC-POMDPs) provide a rich framework for modeling decision making by a team of agents. Despite rapid progress in this area, the limited scalability of solution techniques has restricted the applicability of the model. To overcome this computational barrier, research has focused on restricted classes of DEC-POMDPs, which are easier to solve yet rich enough to capture many practical problems. We present CBDP, an efficient and scalable point-based dynamic programming algorithm for one such model called ND-POMDP (Network Distributed POMDP). Specifically, CBDP provides magnitudes of speedup in the policy computation and generates better quality solution for all test instances. It has linear complexity in the number of agents and horizon length. Furthermore, the complexity per horizon for the examined class of problems is exponential only in a small parameter that depends upon the interaction among the agents, achieving significant scalability for large, loosely coupled multi-agent systems. The efficiency of CBDP lies in exploiting the structure of interactions using constraint networks. These results extend significantly the effectiveness of decision-theoretic planning in multi-agent settings. |
Kumar, Akshat; Zilberstein, Shlomo Dynamic Programming Approximations for Partially Observable Stochastic Games Conference Proceedings of the 22nd International FLAIRS Conference, Sanibel Island, Florida, 2009. @conference{SZ:KZflairs09, Partially observable stochastic games (POSGs) provide a rich mathematical framework for planning under uncertainty by a group of agents. However, this modeling advantage comes with a price, namely a high computational cost. Solving POSGs optimally quickly becomes intractable after a few decision cycles. Our main contribution is to provide bounded approximation techniques, which enable us to scale POSG algorithms by several orders of magnitude. We study both the POSG model and its cooperative counterpart, DEC-POMDP. Experiments on a number of problems confirm the scalability of our approach while still providing useful policies. |
Kumar, Akshat; Zilberstein, Shlomo Event-Detecting Multi-Agent MDPs: Complexity and Constant-Factor Approximation Conference Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI), Pasadena, California, 2009. @conference{SZ:KZijcai09, Planning under uncertainty for multiple agents has grown rapidly with the development of formal models such as multi-agent MDPs and decentralized MDPs. But despite their richness, the applicability of these models remains limited due to their computational complexity. We present the class of event-detecting multi-agent MDPs (eMMDPs), designed to detect multiple mobile targets by a team of sensor agents. We show that eMMDPs are NP-Hard and present a scalable 2-approximation algorithm for solving them using matroid theory and constraint optimization. The complexity of the algorithm is linear in the state-space and number of agents, quadratic in the horizon, and exponential only in a small parameter that depends on the interaction among the agents. Despite the worst-case approximation ratio of 2, experimental results show that the algorithm produces near-optimal policies for a range of test problems. |
Amato, Christopher; Dibangoye, Jilles Steeve; Zilberstein, Shlomo Incremental Policy Generation for Finite-Horizon DEC-POMDPs Conference Proceedings of the 19th International Conference on Automated Planning and Scheduling (ICAPS), Thessaloniki, Greece, 2009. @conference{SZ:ADZicaps09, Decentralized partially observable MDPs (DEC-POMDPs) provide a rich framework for modeling decision making by a team of agents. Despite rapid progress in this area, the limited scalability of solution techniques has restricted the applicability of the model. To overcome this computational barrier, research has focused on restricted classes of DEC-POMDPs, which are easier to solve yet rich enough to capture many practical problems. We present CBDP, an efficient and scalable point-based dynamic programming algorithm for one such model called ND-POMDP (Network Distributed POMDP). Specifically, CBDP provides magnitudes of speedup in the policy computation and generates better quality solution for all test instances. It has linear complexity in the number of agents and horizon length. Furthermore, the complexity per horizon for the examined class of problems is exponential only in a small parameter that depends upon the interaction among the agents, achieving significant scalability for large, loosely coupled multi-agent systems. The efficiency of CBDP lies in exploiting the structure of interactions using constraint networks. These results extend significantly the effectiveness of decision-theoretic planning in multi-agent settings. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Multi-Agent Online Planning with Communication Conference Proceedings of the 19th International Conference on Automated Planning and Scheduling (ICAPS), Thessaloniki, Greece, 2009. @conference{SZ:WZCicaps09, We propose an online algorithm for planning under uncertainty in multi-agent settings modeled as DEC-POMDPs. The algorithm helps overcome the high computational complexity of solving such problems off-line. The key challenge is to produce coordinated behavior using little or no communication. When communication is allowed but constrained, the challenge is to produce high value with minimal communication. The algorithm addresses these challenges by communicating only when history inconsistency is detected, allowing communication to be postponed if necessary. Moreover, it bounds the memory usage at each step and can be applied to problems with arbitrary horizons. The experimental results confirm that the algorithm can solve problems that are too large for the best existing off-line planning algorithms and it outperforms the best online method, producing higher value with much less communication in most cases. |
Allen, Martin; Zilberstein, Shlomo Complexity of Decentralized Control: Special Cases Conference Proceedings of the 23rd Neural Information Processing Systems Conference (NIPS), Vancouver, British Columbia, Canada, 2009. @conference{SZ:AZnips09, The worst-case complexity of general decentralized POMDPs, which are equivalent to partially observable stochastic games (POSGs) is very high, both for the cooperative and competitive cases. Some reductions in complexity have been achieved by exploiting independence relations in some models. We show that these results are somewhat limited: when these independence assumptions are relaxed in very small ways, complexity returns to that of the general case. |
Goldman, Claudia V; Zilberstein, Shlomo Communication-Based Decomposition Mechanisms for Decentralized MDPs Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 32, pp. 169–202, 2008. @article{SZ:GZjair08, Multi-agent planning in stochastic environments can be framed formally as a decentralized Markov decision problem. Many real-life distributed problems that arise in manufacturing, multi-robot coordination and information gathering scenarios can be formalized using this framework. However, finding the optimal solution in the general case is hard, limiting the applicability of recently developed algorithms. This paper provides a practical approach for solving decentralized control problems when communication among the decision makers is possible, but costly. We develop the notion of communication-based mechanism that allows us to decompose a decentralized MDP into multiple single-agent problems. In this framework, referred to as decentralized semi-Markov decision process with direct communication (Dec-SMDP-Com), agents operate separately between communications. We show that finding an optimal mechanism is equivalent to solving optimally a Dec-SMDP-Com. We also provide a heuristic search algorithm that converges on the optimal decomposition. Restricting the decomposition to some specific types of local behaviors reduces significantly the complexity of planning. In particular, we present a polynomial-time algorithm for the case in which individual agents perform goal-oriented behaviors between communications. The paper concludes with an additional tractable algorithm that enables the introduction of human knowledge, thereby reducing the overall problem to finding the best time to communicate. Empirical results show that these approaches provide good approximate solutions. |
Seuken, Sven; Zilberstein, Shlomo Formal Models and Algorithms for Decentralized Decision Making under Uncertainty Journal Article In: Autonomous Agents and Multi-Agent Systems (JAAMAS), vol. 17, no. 2, pp. 190–250, 2008. @article{SZ:SZjaamas08, Multi-agent planning in stochastic environments can be framed formally as a decentralized Markov decision problem. Many real-life distributed problems that arise in manufacturing, multi-robot coordination and information gathering scenarios can be formalized using this framework. However, finding the optimal solution in the general case is hard, limiting the applicability of recently developed algorithms. This paper provides a practical approach for solving decentralized control problems when communication among the decision makers is possible, but costly. We develop the notion of communication-based mechanism that allows us to decompose a decentralized MDP into multiple single-agent problems. In this framework, referred to as decentralized semi-Markov decision process with direct communication (Dec-SMDP-Com), agents operate separately between communications. We show that finding an optimal mechanism is equivalent to solving optimally a Dec-SMDP-Com. We also provide a heuristic search algorithm that converges on the optimal decomposition. Restricting the decomposition to some specific types of local behaviors reduces significantly the complexity of planning. In particular, we present a polynomial-time algorithm for the case in which individual agents perform goal-oriented behaviors between communications. The paper concludes with an additional tractable algorithm that enables the introduction of human knowledge, thereby reducing the overall problem to finding the best time to communicate. Empirical results show that these approaches provide good approximate solutions. |
Petrik, Marek; Zilberstein, Shlomo A Successive Approximation Algorithm for Coordination Problems Conference Proceedings of the 10th International Symposium on Artificial Intelligence and Mathematics (ISAIM), Ft. Lauderdale, Florida, 2008. @conference{SZ:PZisaim08, Developing scalable coordination algorithms for multi-agent systems is a hard computational challenge. One useful approach, demonstrated by the Coverage Set Algorithm (CSA), exploits structured interaction to produce significant computational gains. Empirically, CSA exhibits very good anytime performance, but an error bound on the results has not been established. We reformulate the algorithm and derive an online error bound for approximate solutions. Moreover, we propose an effective way to automatically reduce the complexity of the interaction. Our experiments show that this is a promising approach to solve a broad class of decentralized decision problems. The general formulation used by the algorithm makes it both easy to implement and widely applicable to a variety of other AI problems. |
Carlin, Alan; Zilberstein, Shlomo Value-Based Observation Compression for DEC-POMDPs Conference Proceedings of the 7th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Estoril, Portugal, 2008. @conference{SZ:CZaamas08, Representing agent policies compactly is essential for improving the scalability of multi-agent planning algorithms. In this paper, we focus on developing a pruning technique that allows us to merge certain observations within agent policies, while minimizing loss of value. This is particularly important for solving finite-horizon decentralized POMDPs, where agent policies are represented as trees, and where the size of policy trees grows exponentially with the number of observations. We introduce a value-based observation compression technique that prunes the least valuable observations while maintaining an error bound on the value lost as a result of pruning. We analyze the characteristics of this pruning strategy and show empirically that it is effective. Thus, we use compact policies to obtain significantly higher values compared with the best existing DEC-POMDP algorithm. |
Carlin, Alan; Zilberstein, Shlomo Observation Compression in DEC-POMDP Policy Trees Conference AAMAS Workshop on Multi-Agent Sequential Decision Making in Uncertain Domains (MSDM), Estoril, Portugal, 2008, (Best Paper Award). @conference{SZ:CZmsdm08, Representing agent policies compactly is essential for improving the scalability of multi-agent planning algorithms. In this paper, we focus on developing a pruning technique that allows us to merge certain observations from agent policies, while minimizing the loss of value. This is particularly important for solving finite-horizon decentralized POMDPs, where agent policies are represented as trees, and where the size of policy trees grows exponentially with the number of observations. We introduce a value-based observation compression technique that prunes the least valuable observations while maintaining an error bound on the value lost as a result of pruning. We analyze the characteristics of this pruning strategy and show empirically that it is effective. Thus, we use compact policies to obtain significantly higher values compared with the best existing DEC-POMDP algorithm. |
Amato, Christopher; Zilberstein, Shlomo What's Worth Memorizing: Attribute-based Planning for DEC-POMDPs Conference ICAPS Workshop on Multiagent Planning, Sydney, Australia, 2008. @conference{SZ:AZmasplan08, Current algorithms for decentralized partially observable Markov decision processes (DEC-POMDPs) require a large amount of memory to produce high quality plans. To combat this, existing methods optimize a set of finite-state controllers with an arbitrary amount of fixed memory. While this works well for some problems, in general, scalability and solution quality remain limited. As an alternative, we propose remembering some attributes that summarize key aspects of an agent's action and observation history. These attributes are often simple to determine, provide a well-motivated choice of controller size and focus the solution search on important components of agent histories. We show that for a range of DEC-POMDPs such attribute-based representation improves plan quality and scalability. |
Goldman, Claudia V; Allen, Martin; Zilberstein, Shlomo Learning to Communicate in a Decentralized Environment Journal Article In: Autonomous Agents and Multi-Agent Systems (JAAMAS), vol. 15, no. 1, pp. 47–90, 2007. @article{SZ:GAZjaamas07, Learning to communicate is an emerging challenge in AI research. It is known that agents interacting in decentralized, stochastic environments can benefit from exchanging information. Multi-agent planning generally assumes that agents share a common means of communication; however, in building robust distributed systems it is important to address potential miscoordination resulting from misinterpretation of messages exchanged. This paper lays foundations for studying this problem, examining its properties analytically and empirically in a decision-theoretic context. We establish a formal framework for the problem, and identify a collection of necessary and sufficient properties for decision problems that allow agents to employ probabilistic updating schemes in order to learn how to interpret what others are communicating. Solving the problem optimally is often intractable, but our approach enables agents using different languages to converge upon coordination over time. Our experimental work establishes how these methods perform when applied to problems of varying complexity. |
Seuken, Sven; Zilberstein, Shlomo Memory-Bounded Dynamic Programming for DEC-POMDPs Conference Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI), Hyderabad, India, 2007. @conference{SZ:SZijcai07, Decentralized decision making under uncertainty has been shown to be intractable when each agent has different partial information about the domain. Thus, improving the applicability and scalability of planning algorithms is an important challenge. We present the first memory-bounded dynamic programming algorithm for finite-horizon decentralized POMDPs. A set of heuristics is used to identify relevant points of the infinitely large belief space. Using these belief points, the algorithm successively selects the best joint policies for each horizon. The algorithm is extremely efficient, having linear time and space complexity with respect to the horizon length. Experimental results show that it can handle horizons that are multiple orders of magnitude larger than what was previously possible, while achieving the same or better solution quality. These results significantly increase the applicability of decentralized decision-making techniques. |
Amato, Christopher; Bernstein, Daniel S; Zilberstein, Shlomo Optimizing Memory-Bounded Controllers for Decentralized POMDPs Conference Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (UAI), Vancouver, British Columbia, 2007. @conference{SZ:ABZuai07, We present a memory-bounded optimization approach for solving infinite-horizon decentralized POMDPs. Policies for each agent are represented by stochastic finite state controllers. We formulate the problem of optimizing these policies as a nonlinear program, leveraging powerful existing nonlinear optimization techniques for solving the problem. While existing solvers only guarantee locally optimal solutions, we show that our formulation produces higher quality controllers than the state-of-the-art approach. We also incorporate a shared source of randomness in the form of a correlation device to further increase solution quality with only a limited increase in space and time. Our experimental results show that nonlinear optimization can be used to provide high quality, concise solutions to decentralized decision problems under uncertainty. |
Seuken, Sven; Zilberstein, Shlomo Improved Memory-Bounded Dynamic Programming for Decentralized POMDPs Conference Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (UAI), Vancouver, British Columbia, 2007. @conference{SZ:SZuai07, Memory-Bounded Dynamic Programming (MBDP) has proved extremely effective in solving decentralized POMDPs with large horizons. We generalize the algorithm and improve its scalability by reducing the complexity with respect to the number of observations from exponential to polynomial. We derive error bounds on solution quality with respect to this new approximation and analyze the convergence behavior. To evaluate the effectiveness of the improvements, we introduce a new, larger benchmark problem. Experimental results show that despite the high complexity of decentralized POMDPs, scalable solution techniques such as MBDP perform surprisingly well. |
Allen, Martin; Zilberstein, Shlomo Agent Influence as a Predictor of Difficulty for Decentralized Problem-Solving Conference Proceedings of the 22nd Conference on Artificial Intelligence (AAAI), Vancouver, British Columbia, 2007. @conference{SZ:AZaaai07, We study the effect of problem structure on the practical performance of optimal dynamic programming for decentralized decision problems. It is shown that restricting agent influence over problem dynamics can make the problem easier to solve. Experimental results establish that agent influence correlates with problem difficulty: as the gap between the influence of different agents grows, problems tend to become much easier to solve. The measure thus provides a general-purpose, automatic characterization of decentralized problems, identifying those for which optimal methods are more or less likely to work. Such a measure is also of possible use as a heuristic in the design of algorithms that create task decompositions and control hierarchies in order to simplify multiagent problems. |
Petrik, Marek; Zilberstein, Shlomo Anytime Coordination Using Separable Bilinear Programs Conference Proceedings of the 22nd Conference on Artificial Intelligence (AAAI), Vancouver, British Columbia, 2007. @conference{SZ:PZaaai07, Developing scalable coordination algorithms for multi-agent systems is a hard computational challenge. One useful approach, demonstrated by the Coverage Set Algorithm (CSA), exploits structured interaction to produce significant computational gains. Empirically, CSA exhibits very good anytime performance, but an error bound on the results has not been established. We reformulate the algorithm and derive both online and offline error bounds for approximate solutions. Moreover, we propose an effective way to automatically reduce the complexity of the interaction. Our experiments show that this is a promising approach to solve a broad class of decentralized decision problems. The general formulation used by the algorithm makes it both easy to implement and widely applicable to a variety of other AI problems. |
Szer, Daniel; Charpillet, Francois; Zilberstein, Shlomo MAA*: A Heuristic Search Algorithm for Solving Decentralized POMDPs Conference Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence (UAI), Edinburgh, Scotland, 2005. @conference{SZ:SCZuai05, We present multi-agent A* (MAA*), the first complete and optimal heuristic search algorithm for solving decentralized partially-observable Markov decision problems (DEC- POMDPs) with finite horizon. The algorithm is suitable for computing optimal plans for a cooperative group of agents that operate in a stochastic environment such as multi-robot coordination, network traffic control, or distributed resource allocation. Solving such problems effectively is a major challenge in the area of planning under uncertainty. Our solution is based on a synthesis of classical heuristic search and decentralized control theory. Experimental results show that MAA* has significant advantages. We introduce an anytime variant of MAA* and conclude with a discussion of promising extensions such as an approach to solving infinite-horizon problems. |
Allen, Martin; Goldman, Claudia V; Zilberstein, Shlomo Language Learning in Multi-Agent Systems Conference Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI), Edinburgh, Scotland, 2005. @conference{SZ:AGZijcai05, We present the problem of learning to communicate in decentralized and stochastic environments, analyzing it formally in a decision-theoretic context and illustrating the concept experimentally. Our approach allows agents to converge upon coordinated communication and action over time. |
Bernstein, Daniel S; Hansen, Eric A; Zilberstein, Shlomo Bounded Policy Iteration for Decentralized POMDPs Conference Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI), Edinburgh, Scotland, 2005. @conference{SZ:BHZijcai05, We present a bounded policy iteration algorithm for infinite-horizon decentralized POMDPs. Policies are represented as joint stochastic finite-state controllers, which consist of a local controller for each agent. We also let a joint controller include a correlation device that allows the agents to correlate their behavior without exchanging information during execution, and show that this leads to improved performance. The algorithm uses a fixed amount of memory, and each iteration is guaranteed to produce a controller with value at least as high as the previous one for all possible initial state distributions. For the case of a single agent, the algorithm reduces to Poupart and Boutilier's bounded policy iteration for POMDPs. |
Multiagent Planning and DEC-POMDPs
Mahmud, Saaduddin; Nashed, Samer B.; Goldman, Claudia V.; Zilberstein, Shlomo Estimating Causal Responsibility for Explaining Autonomous Behavior Book Section In: Calvaresi, Davide (Ed.): International Workshop on Explainable and Transparent AI and Multi-Agent Systems (EXTRAAMAS), pp. 78–94, Springer, 2023. @incollection{SZ:MNGZextraamas23, There has been growing interest in causal explanations of stochastic, sequential decision-making systems. Structural causal models and causal reasoning offer several theoretical benefits when exact inference can be applied. Furthermore, users overwhelmingly prefer the resulting causal explanations over other state-of-the-art systems. In this work, we focus on one such method, MeanRESP, and its approximate versions that drastically reduce compute load and assign a responsibility score to each variable, which helps identify smaller sets of causes to be used as explanations. However, this method, and its approximate versions in particular, lack deeper theoretical analysis and broader empirical tests. To address these shortcomings, we provide three primary contributions. First, we offer several theoretical insights on the sample complexity and error rate of approximate MeanRESP. Second, we discuss several automated metrics for comparing explanations generated from approximate methods to those generated via exact methods. While we recognize the significance of user studies as the gold standard for evaluating explanations, our aim is to leverage the proposed metrics to systematically compare explanation-generation methods along important quantitative dimensions. Finally, we provide a more detailed discussion of MeanRESP and how its output under different definitions of responsibility compares to existing widely adopted methods that use Shapley values. |
Parr, Shane; Khatri, Ishan; Svegliato, Justin; Zilberstein, Shlomo Agent-Aware State Estimation in Autonomous Vehicles Conference Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 2021. @conference{SZ:PKSZiros21, Autonomous systems often operate in environments where the behavior of multiple agents is coordinated by a shared global state. Reliable estimation of the global state is thus critical for successfully operating in a multi-agent setting. We introduce agent-aware state estimation--a framework for calculating indirect estimations of state given observations of the behavior of other agents in the environment. We also introduce transition-independent agent-aware state estimation--a tractable class of agent-aware state estimation--and show that it allows the speed of inference to scale linearly with the number of agents in the environment. As an example, we model traffic light classification in instances of complete loss of direct observation. By taking into account observations of vehicular behavior from multiple directions of traffic, our approach exhibits accuracy higher than that of existing traffic light-only HMM methods on a real-world autonomous vehicle data set under a variety of simulated occlusion scenarios. |
Wu, Feng; Zilberstein, Shlomo; Jennings, Nicholas R Multi-Agent Planning with High-Level Human Guidance Conference Proceedings of Principles and Practice of Multi-Agent Systems (PRIMA), 2020. @conference{SZ:WZJprima20, Planning and coordination of multiple agents in the presence of uncertainty and noisy sensors is extremely hard. A human operator who observes a multi-agent team can provide valuable guidance to the team based on her superior ability to interpret observations and assess the overall situation. We propose an extension of decentralized POMDPs that allows such human guidance to be factored into the planning and execution processes. Human guidance in our framework consists of intuitive high-level commands that the agents must translate into a suitable joint plan that is sensitive to what they know from local observations. The result is a framework that allows multi-agent systems to benefit from the complex strategic thinking of a human supervising them. We evaluate this approach on several common benchmark problems and show that it can lead to dramatic improvement in performance. |
Wu, Feng; Zilberstein, Shlomo; Jennings, Nicholas R Stochastic Multi-agent Planning with Partial State Models Conference Proceedings of the First International Conference on Distributed Artificial Intelligence (DAI), Beijing, China, 2019. @conference{SZ:WZJdai19, People who observe a multi-agent team can often provide valuable information to the agents based on their superior cognitive abilities to interpret sequences of observations and assess the overall situation. The knowledge they possess is often difficult to be fully represented using a formal model such as DEC-POMDP. To deal with this, we propose an extension of the DEC-POMDP that allows states to be partially specified and benefit from expert knowledge, while preserving the partial observability and decentralized operation of the agents. In particular, we present an algorithm for computing policies based on history samples that include human labeled data in the form of reward reshaping. We also consider ways to minimize the burden on human experts during the labeling phase. The results offer the first approach to incorporating human knowledge in such complex multi-agent settings. We demonstrate the benefits of our approach using a disaster recovery scenario, comparing it to several baseline approaches. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Privacy-Preserving Policy Iteration for Decentralized POMDPs Conference Proceedings of the 32nd Conference on Artificial Intelligence (AAAI), New Orleans, Louisiana, 2018. @conference{SZ:WZCaaai18, We propose the first privacy-preserving approach to address the privacy issues that arise in multi-agent planning problems modeled as a Dec-POMDP. Our solution is a distributed message-passing algorithm based on trials, where the agents' policies are optimized using the cross-entropy method. In our algorithm, the agents' private information is protected using a public-key homomorphic cryptosystem. We prove the correctness of our algorithm and analyze its complexity in terms of message passing and encryption/decryption operations. Furthermore, we analyze several privacy aspects of our algorithm and show that it can preserve the agent privacy of non-neighbors, model privacy, and decision privacy. Our experimental results on several common Dec-POMDP bench- mark problems confirm the effectiveness of our approach. |
Wray, Kyle Hollins; Kumar, Akshat; Zilberstein, Shlomo Integrated Cooperation and Competition in Multi-Agent Decision-Making Conference Proceedings of the 32nd Conference on Artificial Intelligence (AAAI), New Orleans, Louisiana, 2018. @conference{SZ:WKZaaai18, Observing that many real-world sequential decision problems are not purely cooperative or purely competitive, we propose a new model--cooperative-competitive process (CCP)--that can simultaneously encapsulate both cooperation and competition. First, we discuss how the CCP model bridges the gap between cooperative and competitive models. Next, we investigate a specific class of group-dominant CCPs, in which agents cooperate to achieve a common goal as their primary objective, while also pursuing individual goals as a secondary objective. We provide an approximate solution for this class of problems that leverages stochastic finite-state controllers. The model is grounded in two multi-robot meeting and box-pushing domains that are implemented in simulation and demonstrated on two real robots. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Multi-Agent Planning with Baseline Regret Minimization Conference Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI), 2017. @conference{SZ:WZCijcai17, We propose a novel baseline regret minimization algorithm for multi-agent planning problems modeled as finite-horizon decentralized POMDPs. It guarantees to produce a policy that is provably at least as good as a given baseline policy. We also propose an iterative belief generation algorithm to efficiently minimize the baseline regret, which only requires necessary iterations so as to converge to the policy with minimum baseline regret. Experimental results on common benchmark problems confirm the benefits of the algorithm compared with the state-of-the-art approaches. |
Kumar, Akshat; Mostafa, Hala; Zilberstein, Shlomo Dual Formulations for Optimizing Dec-POMDP Controllers Conference Proceedings of the 26th International Conference on Automated Planning and Scheduling (ICAPS), London, UK, 2016. @conference{SZ:KMZicaps16, Decentralized POMDP is an expressive model for multiagent planning. Finite-state controllers (FSCs)--often used to represent policies for infinite-horizon problems---offer a compact, simple-to-execute policy representation. We exploit novel connections between optimizing decentralized FSCs and the dual linear program for MDPs. Consequently, we describe a dual mixed integer linear program (MIP) for optimizing deterministic FSCs. We exploit the Dec-POMDP structure to devise a compact MIP and formulate constraints that result in policies executable in partially-observable decentralized settings. We show analytically that the dual formulation can also be exploited within the expectation maximization (EM) framework to optimize stochastic FSCs. The resulting EM algorithm can be implemented by solving a sequence of linear programs, without requiring expensive message passing over the Dec-POMDP DBN. We also present an efficient technique for policy improvement based on a weighted entropy measure. Compared with state-of-the-art FSC methods, our approach offers over an order-of-magnitude speedup, while producing similar or better solutions. |
Kumar, Akshat; Zilberstein, Shlomo; Toussaint, Marc Probabilistic Inference Techniques for Scalable Multiagent Decision Making Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 53, pp. 223–270, 2015. @article{SZ:KZTjair15, Decentralized POMDPs provide an expressive framework for multiagent sequential decision making. However, the complexity of these models -- NEXP-Complete even for two agents -- has limited their scalability. We present a promising new class of approximation algorithms by developing novel connections between multiagent planning and machine learning. We show how the multiagent planning problem can be reformulated as inference in a mixture of dynamic Bayesian networks (DBNs). This planning-as-inference approach paves the way for the application of efficient inference techniques in DBNs to multiagent decision making. To further improve scalability, we identify certain conditions that are sufficient to extend the approach to multiagent systems with dozens of agents. Specifically, we show that the necessary inference within the expectation-maximization framework can be decomposed into processes that often involve a small subset of agents, thereby facilitating scalability. We further show that a number of existing multiagent planning models satisfy these conditions. Experiments on large planning benchmarks confirm the benefits of our approach in terms of runtime and scalability with respect to existing techniques. |
Nguyen, Duc Thien; Yeoh, William; Lau, Hoong Chuin; Zilberstein, Shlomo; Zhang, Chongjie Decentralized Multi-Agent Reinforcement Learning in Average-Reward Dynamic DCOPs Conference Proceedings of the 28th Conference on Artificial Intelligence (AAAI), Quebec City, Canada, 2014. @conference{SZ:NYLZZaaai14, Researchers have introduced the Dynamic Distributed Constraint Optimization Problem (Dynamic DCOP) formulation to model dynamically changing multi-agent coordination problems, where a dynamic DCOP is a sequence of (static canonical) DCOPs, each partially different from the DCOP preceding it. Existing work typically assumes that the problem in each time step is decoupled from the problems in other time steps, which might not hold in some applications. Therefore, in this paper, we make the following contributions: (i) We introduce a new model, called Markovian Dynamic DCOPs (MD-DCOPs), where the DCOP in the next time step is a function of the value assignments in the current time step; (ii) We introduce two distributed reinforcement learning algorithms, the Distributed RVI Q-learning algorithm and the Distributed R-learning algorithm, that balance exploration and exploitation to solve MD-DCOPs in an online manner; and (iii) We empirically evaluate them against an existing multiarm bandit DCOP algorithm on dynamic DCOPs. |
Brafman, Ronen I; Shani, Guy; Zilberstein, Shlomo Qualitative Planning under Partial Observability in Multi-Agent Domains Conference Proceedings of the 27th Conference on Artificial Intelligence (AAAI), Bellevue, Washington, 2013. @conference{SZ:BSZaaai13, Decentralized POMDPs (Dec-POMDPs) provide a rich, attractive model for planning under uncertainty and partial observability in cooperative multi-agent domains with a growing body of research. In this paper we formulate a qualitative, propositional model for multi-agent planning under uncertainty with partial observability, which we call Qualitative Dec-POMDP (QDec-POMDP). We show that the worst-case complexity of planning in QDec-POMDPs is similar to that of Dec-POMDPs. Still, because the model is more "classical" in nature, it is more compact and easier to specify. Furthermore, it eases the adaptation of methods used in classical and contingent planning to solve problems that challenge current Dec-POMDPs solvers. In particular, in this paper we describe a method based on compilation to classical planning, which handles multi-agent planning problems significantly larger than those handled by current Dec-POMDP algorithms. |
Wu, Feng; Zilberstein, Shlomo; Jennings, Nicholas R Monte-Carlo Expectation Maximization for Decentralized POMDPs Conference Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI), Beijing, China, 2013. @conference{SZ:WZJijcai13, We address two significant drawbacks of state-of-the-art solvers of decentralized POMDPs (DEC-POMDPs): the reliance on complete knowledge of the model and limited scalability as the complexity of the domain grows. We extend a recently proposed approach for solving DEC-POMDPs via a reduction to the maximum likelihood problem, which in turn can be solved using EM. We introduce a model-free version of this approach that employs Monte-Carlo EM (MCEM). While a naive implementation of MCEM is inadequate in multi-agent settings, we introduce several improvements in sampling that produce high-quality results on a variety of DEC-POMDP benchmarks, including large problems with thousands of agents. |
Yeoh, William; Kumar, Akshat; Zilberstein, Shlomo Automated Generation of Interaction Graphs for Value-Factored Dec-POMDPs Conference Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI), Beijing, China, 2013. @conference{SZ:YKZijcai13, The Decentralized Partially Observable Markov Decision Process (Dec-POMDP) is a powerful model for multiagent planning under uncertainty, but its applicability is hindered by its high complexity -- solving Dec-POMDPs optimally is NEXP-hard. Recently, Kumar et al. introduced the Value Factorization (VF) framework, which exploits decomposable value functions that can be factored into subfunctions. This framework has been shown to be a generalization of several models that leverage sparse agent interactions such as TI-Dec-MDPs, ND-POMDPs and TD-POMDPs. Existing algorithms for these models assume that the interaction graph of the problem is given. In this paper, we introduce three algorithms to automatically generate interaction graphs for models within the VF framework and establish lower and upper bounds on the expected reward of an optimal joint policy. We illustrate experimentally the benefits of these techniques for sensor placement in a decentralized tracking application. |
Durfee, Edmund; Zilberstein, Shlomo Multiagent Planning, Control, and Execution Book Section In: Weiss, G (Ed.): Multiagent Systems, Second Edition, pp. 485–546, MIT Press, Cambridge, MA, USA, 2013. @incollection{SZ:DZmultiagent13, |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Online Planning for Multi-Agent Systems with Bounded Communication Journal Article In: Artificial Intelligence (AIJ), vol. 175, no. 2, pp. 487–511, 2011. @article{SZ:WZCaij11, We propose an online algorithm for planning under uncertainty in multi-agent settings modeled as DEC-POMDPs. The algorithm helps overcome the high computational complexity of solving such problems offline. The key challenges in decentralized operation are to maintain coordinated behavior with little or no communication and, when communication is allowed, to optimize value with minimal communication. The algorithm addresses these challenges by generating identical conditional plans based on common knowledge and communicating only when history inconsistency is detected, allowing communication to be postponed when necessary. To be suitable for online operation, the algorithm computes good local policies using a new and fast local search method implemented using linear programming. Moreover, it bounds the amount of memory used at each step and can be applied to problems with arbitrary horizons. The experimental results confirm that the algorithm can solve problems that are too large for the best existing offline planning algorithms and it outperforms the best online method, producing much higher value with much less communication in most cases. The algorithm also proves to be effective when the communication channel is imperfect (periodically unavailable). These results contribute to the scalability of decision-theoretic planning in multi-agent settings. |
Kumar, Akshat; Zilberstein, Shlomo Message-Passing Algorithms for Large Structured Decentralized POMDPs Conference Proceedings of the 10th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Taipei, Taiwan, 2011. @conference{SZ:KZaamas11, Anytime algorithms allow a system to trade solution quality for computation time. In previous work, monitoring techniques have been developed to allow agents to stop the computation at the "right" time so as to optimize a given time-dependent utility function. However, these results apply only to the single-agent case. In this paper we analyze the problems that arise when several agents solve components of a larger problem, each using an anytime algorithm. Monitoring in this case is more challenging as each agent is uncertain about the progress made so far by the others. We develop a formal framework for decentralized monitoring, establish the complexity of several interesting variants of the problem, and propose solution techniques for each one. Finally, we show that the framework can be applied to decentralized flow and planning problems. |
Kumar, Akshat; Zilberstein, Shlomo; Toussaint, Marc Scalable Multiagent Planning Using Probabilistic Inference Conference Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI), Barcelona, Spain, 2011. @conference{SZ:KZTijcai11, Multiagent planning has seen much progress with the development of formal models such as Dec-POMDPs. However, the complexity of these models -- NEXP-Complete even for two agents -- has limited scalability. We identify certain mild conditions that are sufficient to make multiagent planning amenable to a scalable approximation w.r.t. the number of agents. This is achieved by constructing a graphical model in which likelihood maximization is equivalent to plan optimization. Using the Expectation-Maximization framework for likelihood maximization, we show that the necessary inference can be decomposed into processes that often involve a small subset of agents, thereby facilitating scalability. We derive a global update rule that combines these local inferences to monotonically increase the overall solution quality. Experiments on a large multiagent planning benchmark confirm the benefits of the new approach in terms of runtime and scalability. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Online Planning for Ad Hoc Autonomous Agent Teams Conference Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI), Barcelona, Spain, 2011. @conference{SZ:WZCijcai11, We propose a novel online planning algorithm for ad hoc team settings -- challenging situations in which an agent must collaborate with unknown teammates without prior coordination. Our approach is based on constructing and solving a series of stage games, and then using biased adaptive play to choose actions. The utility function in each stage game is estimated via Monte-Carlo tree search using the UCT algorithm. We establish analytically the convergence of the algorithm and show that it performs well in a variety of ad hoc team domains. |
Amato, Christopher; Bernstein, Daniel S; Zilberstein, Shlomo Optimizing Fixed-Size Stochastic Controllers for POMDPs and Decentralized POMDPs Journal Article In: Autonomous Agents and Multi-Agent Systems (JAAMAS), vol. 21, no. 3, pp. 293–320, 2010. @article{SZ:ABZjaamas10, Coordination of distributed agents is required for problems arising in many areas, including multi-robot systems, networking and e-commerce. As a formal framework for such problems, we use the decentralized partially observable Markov decision process (DEC-POMDP). Though much work has been done on optimal dynamic programming algorithms for the single-agent version of the problem, optimal algorithms for the multiagent case have been elusive. The main contribution of this paper is an optimal policy iteration algorithm for solving DEC-POMDPs. The algorithm uses stochastic finite-state controllers to represent policies. The solution can include a correlation device, which allows agents to correlate their actions without communicating. This approach alternates between expanding the controller and performing value-preserving transformations, which modify the controller without sacrificing value. We present two Efficient value-preserving transformations: one can reduce the size of the controller and the other can improve its value while keeping the size fixed. Empirical results demonstrate the usefulness of value-preserving transformations in increasing value while keeping controller size to a minimum. To broaden the applicability of the approach, we also present a heuristic version of the policy iteration algorithm, which sacrifices convergence to optimality. This algorithm further reduces the size of the controllers at each step by assuming that probability distributions over the other agents' actions are known. While this assumption may not hold in general, it helps produce higher quality solutions in our test problems. |
Kumar, Akshat; Zilberstein, Shlomo Point-Based Backup for Decentralized POMDPs: Complexity and New Algorithms Conference Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Toronto, Canada, 2010. @conference{SZ:KZaamas10, Decentralized POMDPs provide an expressive framework for sequential multi-agent decision making. Despite their high complexity, there has been significant progress in scaling up existing algorithms, largely due to the use of point-based methods. Performing point-based backup is a fundamental operation in state-of-the-art algorithms. We show that even a single backup step in the multi-agent setting is NP-Complete. Despite this negative worst-case result, we present an efficient and scalable optimal algorithm as well as a principled approximation scheme. The optimal algorithm exploits recent advances in the weighted CSP literature to overcome the complexity of the backup operation. The polytime approximation scheme provides a constant factor approximation guarantee based on the number of belief points. In experiments on standard domains, the optimal approach provides significant speedup (up to 2 orders of magnitude) over the previous best optimal algorithm and is able to increase the number of belief points by more than a factor of 3. The approximation scheme also works well in practice, providing near-optimal solutions to the backup problem. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Point-Based Policy Generation for Decentralized POMDPs Conference Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Toronto, Canada, 2010. @conference{SZ:WZCaamas10, Memory-bounded techniques have shown great promise in solving complex multi-agent planning problems modeled as DEC-POMDPs. Much of the performance gains can be attributed to pruning techniques that alleviate the complexity of the exhaustive backup step of the original MBDP algorithm. Despite these improvements, state-of-the-art algorithms can still handle a relative small pool of candidate policies, which limits the quality of the solution in some benchmark problems. We present a new algorithm, Point-Based Policy Generation, which avoids altogether searching the entire joint policy space. The key observation is that the best joint policy for each reachable belief state can be constructed directly, instead of producing first a large set of candidates. We also provide an efficient approximate implementation of this operation. The experimental results show that our solution technique improves the performance significantly in terms of both runtime and solution quality. |
Kumar, Akshat; Zilberstein, Shlomo Anytime Planning for Decentralized POMDPs using Expectation Maximization Conference Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI), Catalina Island, California, 2010. @conference{SZ:KZuai10, Decentralized POMDPs provide an expressive framework for multi-agent sequential decision making. While finite-horizon DEC-POMDPs have enjoyed significant success, progress remains slow for the infinite-horizon case mainly due to the inherent complexity of optimizing stochastic controllers representing agent policies. We present a promising new class of algorithms for the infinite-horizon case, which recasts the optimization problem as inference in a mixture of DBNs. An attractive feature of this approach is the straightforward adoption of existing inference techniques in DBNs for solving DEC-POMDPs and supporting richer representations such as factored or continuous states and actions. We also derive the Expectation Maximization (EM) algorithm to optimize the joint policy represented as DBNs. Experiments on benchmark domains show that EM compares favorably against the state-of-the-art solvers. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Rollout Sampling Policy Iteration for Decentralized POMDPs Conference Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI), Catalina Island, California, 2010. @conference{SZ:WZCuai10, We present decentralized rollout sampling policy iteration (DecRSPI)--a new algorithm for multiagent decision problems formalized as DEC-POMDPs. DecRSPI is designed to improve scalability and tackle problems that lack an explicit model. The algorithm uses Monte-Carlo methods to generate a sample of reachable belief states. Then it computes a joint policy for each belief state based on the rollout estimations. A new policy representation allows us to represent solutions compactly. The key benefits of the algorithm are its linear time complexity over the number of agents, its bounded memory usage and good solution quality. It can solve larger problems that are intractable for existing planning algorithms. Experimental results confirm the effectiveness and scalability of the approach. |
Amato, Christopher; Bonet, Blai; Zilberstein, Shlomo Finite-State Controllers Based on Mealy Machines for Centralized and Decentralized POMDPs Conference Proceedings of the 24th Conference on Artificial Intelligence (AAAI), Atlanta, Georgia, 2010. @conference{SZ:ABZaaai10, Existing controller-based approaches for centralized and decentralized POMDPs are based on automata with output known as Moore machines. In this paper, we show that several advantages can be gained by utilizing another type of automata, the Mealy machine. Mealy machines are more powerful than Moore machines, provide a richer structure that can be exploited by solution methods, and can be easily incorporated into current controller-based approaches. To demonstrate this, we adapted some existing controller-based algorithms to use Mealy machines and obtained results on a set of benchmark domains. The Mealy-based approach always outperformed the Moore-based approach and often outperformed the state-of-the-art algorithms for both centralized and decentralized POMDPs. These findings provide fresh and general insights for the improvement of existing algorithms and the development of new ones. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Trial-Based Dynamic Programming for Multi-Agent Planning Conference Proceedings of the 24th Conference on Artificial Intelligence (AAAI), Atlanta, Georgia, 2010. @conference{SZ:WZCaaai10, Trial-based approaches offer an efficient way to solve single-agent MDPs and POMDPs. These approaches allow agents to focus their computations on regions of the environment they encounter during the trials, leading to significant computational savings. We present a novel trial-based dynamic programming (TBDP) algorithm for DEC-POMDPs that extends these benefits to multi-agent settings. The algorithm uses trial-based methods for both belief generation and policy evaluation. Policy improvement is implemented efficiently using linear programming and a sub-policy reuse technique that helps bound the amount of memory. The results show that TBDP can produce significant value improvements and is much faster than the best existing planning algorithms. |
Bernstein, Daniel S; Amato, Christopher; Hansen, Eric A; Zilberstein, Shlomo Policy Iteration for Decentralized Control of Markov Decision Processes Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 34, pp. 89–132, 2009. @article{SZ:BAHZjair09, Coordination of distributed agents is required for problems arising in many areas, including multi-robot systems, networking and e-commerce. As a formal framework for such problems, we use the decentralized partially observable Markov decision process (DEC-POMDP). Though much work has been done on optimal dynamic programming algorithms for the single-agent version of the problem, optimal algorithms for the multiagent case have been elusive. The main contribution of this paper is an optimal policy iteration algorithm for solving DEC-POMDPs. The algorithm uses stochastic finite-state controllers to represent policies. The solution can include a correlation device, which allows agents to correlate their actions without communicating. This approach alternates between expanding the controller and performing value-preserving transformations, which modify the controller without sacrificing value. We present two Efficient value-preserving transformations: one can reduce the size of the controller and the other can improve its value while keeping the size fixed. Empirical results demonstrate the usefulness of value-preserving transformations in increasing value while keeping controller size to a minimum. To broaden the applicability of the approach, we also present a heuristic version of the policy iteration algorithm, which sacrifices convergence to optimality. This algorithm further reduces the size of the controllers at each step by assuming that probability distributions over the other agents' actions are known. While this assumption may not hold in general, it helps produce higher quality solutions in our test problems. |
Petrik, Marek; Zilberstein, Shlomo A Bilinear Programming Approach for Multiagent Planning Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 35, pp. 235–274, 2009. @article{SZ:PZjair09, Multiagent planning and coordination problems are common and known to be computationally hard. We show that a wide range of two-agent problems can be formulated as bilinear programs. We present a successive approximation algorithm that significantly outperforms the coverage set algorithm, which is the state-of-the-art method for this class of multiagent problems. Because the algorithm is formulated for bilinear programs, it is more general and simpler to implement. The new algorithm can be terminated at any time and--unlike the coverage set algorithm--it facilitates the derivation of a useful online performance bound. It is also much more efficient, on average reducing the computation time of the optimal solution by about four orders of magnitude. Finally, we introduce an automatic dimensionality reduction method that improves the effectiveness of the algorithm, extending its applicability to new domains and providing a new way to analyze a subclass of bilinear programs. |
Becker, Raphen; Carlin, Alan; Lesser, Victor; Zilberstein, Shlomo Analyzing Myopic Approaches for Multi-Agent Communication Journal Article In: Computational Intelligence, vol. 25, no. 1, pp. 31–50, 2009. @article{SZ:BCLZci09, Choosing when to communicate is a fundamental problem in multi-agent systems. This problem becomes particularly challenging when communication is constrained and each agent has different partial information about the overall situation. We take a decision-theoretic approach to this problem that balances the benefits of communication against the costs. Although computing the exact value of communication is intractable, it can be estimated using a standard myopic assumption--that communication is only possible at the present time. We examine specific situations in which this assumption leads to poor performance and demonstrate an alternative approach that relaxes the assumption and improves performance. The results provide an effective method for value-driven communication policies in multi-agent systems. |
Amato, Christopher; Zilberstein, Shlomo Achieving Goals in Decentralized POMDPs Conference Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Budapest, Hungary, 2009. @conference{ASZ:Zaamas09, Coordination of multiple agents under uncertainty in the decentralized POMDP model is known to be NEXP-complete, even when the agents have a joint set of goals. Nevertheless, we show that the existence of goals can help develop effective planning algorithms. We examine an approach to model these problems as indefinite-horizon decentralized POMDPs, suitable for many practical problems that terminate after some unspecified number of steps. Our algorithm for solving these problems is optimal under some common assumptions--that terminal actions exist for each agent and rewards for non-terminal actions are negative. We also propose an infinite-horizon approximation method that allows us to relax these assumptions while maintaining goal conditions. An optimality bound is developed for this sample-based approach and experimental results show that it is able to exploit the goal structure effectively. Compared with the state-of-the-art, our approach can solve larger problems and produce significantly better solutions. |
Kumar, Akshat; Zilberstein, Shlomo Constraint-Based Dynamic Programming for Decentralized POMDPs with Structured Interactions Conference Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Budapest, Hungary, 2009. @conference{SZ:KZaamas09, Decentralized partially observable MDPs (DEC-POMDPs) provide a rich framework for modeling decision making by a team of agents. Despite rapid progress in this area, the limited scalability of solution techniques has restricted the applicability of the model. To overcome this computational barrier, research has focused on restricted classes of DEC-POMDPs, which are easier to solve yet rich enough to capture many practical problems. We present CBDP, an efficient and scalable point-based dynamic programming algorithm for one such model called ND-POMDP (Network Distributed POMDP). Specifically, CBDP provides magnitudes of speedup in the policy computation and generates better quality solution for all test instances. It has linear complexity in the number of agents and horizon length. Furthermore, the complexity per horizon for the examined class of problems is exponential only in a small parameter that depends upon the interaction among the agents, achieving significant scalability for large, loosely coupled multi-agent systems. The efficiency of CBDP lies in exploiting the structure of interactions using constraint networks. These results extend significantly the effectiveness of decision-theoretic planning in multi-agent settings. |
Kumar, Akshat; Zilberstein, Shlomo Dynamic Programming Approximations for Partially Observable Stochastic Games Conference Proceedings of the 22nd International FLAIRS Conference, Sanibel Island, Florida, 2009. @conference{SZ:KZflairs09, Partially observable stochastic games (POSGs) provide a rich mathematical framework for planning under uncertainty by a group of agents. However, this modeling advantage comes with a price, namely a high computational cost. Solving POSGs optimally quickly becomes intractable after a few decision cycles. Our main contribution is to provide bounded approximation techniques, which enable us to scale POSG algorithms by several orders of magnitude. We study both the POSG model and its cooperative counterpart, DEC-POMDP. Experiments on a number of problems confirm the scalability of our approach while still providing useful policies. |
Kumar, Akshat; Zilberstein, Shlomo Event-Detecting Multi-Agent MDPs: Complexity and Constant-Factor Approximation Conference Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI), Pasadena, California, 2009. @conference{SZ:KZijcai09, Planning under uncertainty for multiple agents has grown rapidly with the development of formal models such as multi-agent MDPs and decentralized MDPs. But despite their richness, the applicability of these models remains limited due to their computational complexity. We present the class of event-detecting multi-agent MDPs (eMMDPs), designed to detect multiple mobile targets by a team of sensor agents. We show that eMMDPs are NP-Hard and present a scalable 2-approximation algorithm for solving them using matroid theory and constraint optimization. The complexity of the algorithm is linear in the state-space and number of agents, quadratic in the horizon, and exponential only in a small parameter that depends on the interaction among the agents. Despite the worst-case approximation ratio of 2, experimental results show that the algorithm produces near-optimal policies for a range of test problems. |
Amato, Christopher; Dibangoye, Jilles Steeve; Zilberstein, Shlomo Incremental Policy Generation for Finite-Horizon DEC-POMDPs Conference Proceedings of the 19th International Conference on Automated Planning and Scheduling (ICAPS), Thessaloniki, Greece, 2009. @conference{SZ:ADZicaps09, Decentralized partially observable MDPs (DEC-POMDPs) provide a rich framework for modeling decision making by a team of agents. Despite rapid progress in this area, the limited scalability of solution techniques has restricted the applicability of the model. To overcome this computational barrier, research has focused on restricted classes of DEC-POMDPs, which are easier to solve yet rich enough to capture many practical problems. We present CBDP, an efficient and scalable point-based dynamic programming algorithm for one such model called ND-POMDP (Network Distributed POMDP). Specifically, CBDP provides magnitudes of speedup in the policy computation and generates better quality solution for all test instances. It has linear complexity in the number of agents and horizon length. Furthermore, the complexity per horizon for the examined class of problems is exponential only in a small parameter that depends upon the interaction among the agents, achieving significant scalability for large, loosely coupled multi-agent systems. The efficiency of CBDP lies in exploiting the structure of interactions using constraint networks. These results extend significantly the effectiveness of decision-theoretic planning in multi-agent settings. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Multi-Agent Online Planning with Communication Conference Proceedings of the 19th International Conference on Automated Planning and Scheduling (ICAPS), Thessaloniki, Greece, 2009. @conference{SZ:WZCicaps09, We propose an online algorithm for planning under uncertainty in multi-agent settings modeled as DEC-POMDPs. The algorithm helps overcome the high computational complexity of solving such problems off-line. The key challenge is to produce coordinated behavior using little or no communication. When communication is allowed but constrained, the challenge is to produce high value with minimal communication. The algorithm addresses these challenges by communicating only when history inconsistency is detected, allowing communication to be postponed if necessary. Moreover, it bounds the memory usage at each step and can be applied to problems with arbitrary horizons. The experimental results confirm that the algorithm can solve problems that are too large for the best existing off-line planning algorithms and it outperforms the best online method, producing higher value with much less communication in most cases. |
Allen, Martin; Zilberstein, Shlomo Complexity of Decentralized Control: Special Cases Conference Proceedings of the 23rd Neural Information Processing Systems Conference (NIPS), Vancouver, British Columbia, Canada, 2009. @conference{SZ:AZnips09, The worst-case complexity of general decentralized POMDPs, which are equivalent to partially observable stochastic games (POSGs) is very high, both for the cooperative and competitive cases. Some reductions in complexity have been achieved by exploiting independence relations in some models. We show that these results are somewhat limited: when these independence assumptions are relaxed in very small ways, complexity returns to that of the general case. |
Goldman, Claudia V; Zilberstein, Shlomo Communication-Based Decomposition Mechanisms for Decentralized MDPs Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 32, pp. 169–202, 2008. @article{SZ:GZjair08, Multi-agent planning in stochastic environments can be framed formally as a decentralized Markov decision problem. Many real-life distributed problems that arise in manufacturing, multi-robot coordination and information gathering scenarios can be formalized using this framework. However, finding the optimal solution in the general case is hard, limiting the applicability of recently developed algorithms. This paper provides a practical approach for solving decentralized control problems when communication among the decision makers is possible, but costly. We develop the notion of communication-based mechanism that allows us to decompose a decentralized MDP into multiple single-agent problems. In this framework, referred to as decentralized semi-Markov decision process with direct communication (Dec-SMDP-Com), agents operate separately between communications. We show that finding an optimal mechanism is equivalent to solving optimally a Dec-SMDP-Com. We also provide a heuristic search algorithm that converges on the optimal decomposition. Restricting the decomposition to some specific types of local behaviors reduces significantly the complexity of planning. In particular, we present a polynomial-time algorithm for the case in which individual agents perform goal-oriented behaviors between communications. The paper concludes with an additional tractable algorithm that enables the introduction of human knowledge, thereby reducing the overall problem to finding the best time to communicate. Empirical results show that these approaches provide good approximate solutions. |
Seuken, Sven; Zilberstein, Shlomo Formal Models and Algorithms for Decentralized Decision Making under Uncertainty Journal Article In: Autonomous Agents and Multi-Agent Systems (JAAMAS), vol. 17, no. 2, pp. 190–250, 2008. @article{SZ:SZjaamas08, Multi-agent planning in stochastic environments can be framed formally as a decentralized Markov decision problem. Many real-life distributed problems that arise in manufacturing, multi-robot coordination and information gathering scenarios can be formalized using this framework. However, finding the optimal solution in the general case is hard, limiting the applicability of recently developed algorithms. This paper provides a practical approach for solving decentralized control problems when communication among the decision makers is possible, but costly. We develop the notion of communication-based mechanism that allows us to decompose a decentralized MDP into multiple single-agent problems. In this framework, referred to as decentralized semi-Markov decision process with direct communication (Dec-SMDP-Com), agents operate separately between communications. We show that finding an optimal mechanism is equivalent to solving optimally a Dec-SMDP-Com. We also provide a heuristic search algorithm that converges on the optimal decomposition. Restricting the decomposition to some specific types of local behaviors reduces significantly the complexity of planning. In particular, we present a polynomial-time algorithm for the case in which individual agents perform goal-oriented behaviors between communications. The paper concludes with an additional tractable algorithm that enables the introduction of human knowledge, thereby reducing the overall problem to finding the best time to communicate. Empirical results show that these approaches provide good approximate solutions. |
Petrik, Marek; Zilberstein, Shlomo A Successive Approximation Algorithm for Coordination Problems Conference Proceedings of the 10th International Symposium on Artificial Intelligence and Mathematics (ISAIM), Ft. Lauderdale, Florida, 2008. @conference{SZ:PZisaim08, Developing scalable coordination algorithms for multi-agent systems is a hard computational challenge. One useful approach, demonstrated by the Coverage Set Algorithm (CSA), exploits structured interaction to produce significant computational gains. Empirically, CSA exhibits very good anytime performance, but an error bound on the results has not been established. We reformulate the algorithm and derive an online error bound for approximate solutions. Moreover, we propose an effective way to automatically reduce the complexity of the interaction. Our experiments show that this is a promising approach to solve a broad class of decentralized decision problems. The general formulation used by the algorithm makes it both easy to implement and widely applicable to a variety of other AI problems. |
Carlin, Alan; Zilberstein, Shlomo Value-Based Observation Compression for DEC-POMDPs Conference Proceedings of the 7th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Estoril, Portugal, 2008. @conference{SZ:CZaamas08, Representing agent policies compactly is essential for improving the scalability of multi-agent planning algorithms. In this paper, we focus on developing a pruning technique that allows us to merge certain observations within agent policies, while minimizing loss of value. This is particularly important for solving finite-horizon decentralized POMDPs, where agent policies are represented as trees, and where the size of policy trees grows exponentially with the number of observations. We introduce a value-based observation compression technique that prunes the least valuable observations while maintaining an error bound on the value lost as a result of pruning. We analyze the characteristics of this pruning strategy and show empirically that it is effective. Thus, we use compact policies to obtain significantly higher values compared with the best existing DEC-POMDP algorithm. |
Carlin, Alan; Zilberstein, Shlomo Observation Compression in DEC-POMDP Policy Trees Conference AAMAS Workshop on Multi-Agent Sequential Decision Making in Uncertain Domains (MSDM), Estoril, Portugal, 2008, (Best Paper Award). @conference{SZ:CZmsdm08, Representing agent policies compactly is essential for improving the scalability of multi-agent planning algorithms. In this paper, we focus on developing a pruning technique that allows us to merge certain observations from agent policies, while minimizing the loss of value. This is particularly important for solving finite-horizon decentralized POMDPs, where agent policies are represented as trees, and where the size of policy trees grows exponentially with the number of observations. We introduce a value-based observation compression technique that prunes the least valuable observations while maintaining an error bound on the value lost as a result of pruning. We analyze the characteristics of this pruning strategy and show empirically that it is effective. Thus, we use compact policies to obtain significantly higher values compared with the best existing DEC-POMDP algorithm. |
Amato, Christopher; Zilberstein, Shlomo What's Worth Memorizing: Attribute-based Planning for DEC-POMDPs Conference ICAPS Workshop on Multiagent Planning, Sydney, Australia, 2008. @conference{SZ:AZmasplan08, Current algorithms for decentralized partially observable Markov decision processes (DEC-POMDPs) require a large amount of memory to produce high quality plans. To combat this, existing methods optimize a set of finite-state controllers with an arbitrary amount of fixed memory. While this works well for some problems, in general, scalability and solution quality remain limited. As an alternative, we propose remembering some attributes that summarize key aspects of an agent's action and observation history. These attributes are often simple to determine, provide a well-motivated choice of controller size and focus the solution search on important components of agent histories. We show that for a range of DEC-POMDPs such attribute-based representation improves plan quality and scalability. |
Goldman, Claudia V; Allen, Martin; Zilberstein, Shlomo Learning to Communicate in a Decentralized Environment Journal Article In: Autonomous Agents and Multi-Agent Systems (JAAMAS), vol. 15, no. 1, pp. 47–90, 2007. @article{SZ:GAZjaamas07, Learning to communicate is an emerging challenge in AI research. It is known that agents interacting in decentralized, stochastic environments can benefit from exchanging information. Multi-agent planning generally assumes that agents share a common means of communication; however, in building robust distributed systems it is important to address potential miscoordination resulting from misinterpretation of messages exchanged. This paper lays foundations for studying this problem, examining its properties analytically and empirically in a decision-theoretic context. We establish a formal framework for the problem, and identify a collection of necessary and sufficient properties for decision problems that allow agents to employ probabilistic updating schemes in order to learn how to interpret what others are communicating. Solving the problem optimally is often intractable, but our approach enables agents using different languages to converge upon coordination over time. Our experimental work establishes how these methods perform when applied to problems of varying complexity. |
Seuken, Sven; Zilberstein, Shlomo Memory-Bounded Dynamic Programming for DEC-POMDPs Conference Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI), Hyderabad, India, 2007. @conference{SZ:SZijcai07, Decentralized decision making under uncertainty has been shown to be intractable when each agent has different partial information about the domain. Thus, improving the applicability and scalability of planning algorithms is an important challenge. We present the first memory-bounded dynamic programming algorithm for finite-horizon decentralized POMDPs. A set of heuristics is used to identify relevant points of the infinitely large belief space. Using these belief points, the algorithm successively selects the best joint policies for each horizon. The algorithm is extremely efficient, having linear time and space complexity with respect to the horizon length. Experimental results show that it can handle horizons that are multiple orders of magnitude larger than what was previously possible, while achieving the same or better solution quality. These results significantly increase the applicability of decentralized decision-making techniques. |
Amato, Christopher; Bernstein, Daniel S; Zilberstein, Shlomo Optimizing Memory-Bounded Controllers for Decentralized POMDPs Conference Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (UAI), Vancouver, British Columbia, 2007. @conference{SZ:ABZuai07, We present a memory-bounded optimization approach for solving infinite-horizon decentralized POMDPs. Policies for each agent are represented by stochastic finite state controllers. We formulate the problem of optimizing these policies as a nonlinear program, leveraging powerful existing nonlinear optimization techniques for solving the problem. While existing solvers only guarantee locally optimal solutions, we show that our formulation produces higher quality controllers than the state-of-the-art approach. We also incorporate a shared source of randomness in the form of a correlation device to further increase solution quality with only a limited increase in space and time. Our experimental results show that nonlinear optimization can be used to provide high quality, concise solutions to decentralized decision problems under uncertainty. |
Seuken, Sven; Zilberstein, Shlomo Improved Memory-Bounded Dynamic Programming for Decentralized POMDPs Conference Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (UAI), Vancouver, British Columbia, 2007. @conference{SZ:SZuai07, Memory-Bounded Dynamic Programming (MBDP) has proved extremely effective in solving decentralized POMDPs with large horizons. We generalize the algorithm and improve its scalability by reducing the complexity with respect to the number of observations from exponential to polynomial. We derive error bounds on solution quality with respect to this new approximation and analyze the convergence behavior. To evaluate the effectiveness of the improvements, we introduce a new, larger benchmark problem. Experimental results show that despite the high complexity of decentralized POMDPs, scalable solution techniques such as MBDP perform surprisingly well. |
Allen, Martin; Zilberstein, Shlomo Agent Influence as a Predictor of Difficulty for Decentralized Problem-Solving Conference Proceedings of the 22nd Conference on Artificial Intelligence (AAAI), Vancouver, British Columbia, 2007. @conference{SZ:AZaaai07, We study the effect of problem structure on the practical performance of optimal dynamic programming for decentralized decision problems. It is shown that restricting agent influence over problem dynamics can make the problem easier to solve. Experimental results establish that agent influence correlates with problem difficulty: as the gap between the influence of different agents grows, problems tend to become much easier to solve. The measure thus provides a general-purpose, automatic characterization of decentralized problems, identifying those for which optimal methods are more or less likely to work. Such a measure is also of possible use as a heuristic in the design of algorithms that create task decompositions and control hierarchies in order to simplify multiagent problems. |
Petrik, Marek; Zilberstein, Shlomo Anytime Coordination Using Separable Bilinear Programs Conference Proceedings of the 22nd Conference on Artificial Intelligence (AAAI), Vancouver, British Columbia, 2007. @conference{SZ:PZaaai07, Developing scalable coordination algorithms for multi-agent systems is a hard computational challenge. One useful approach, demonstrated by the Coverage Set Algorithm (CSA), exploits structured interaction to produce significant computational gains. Empirically, CSA exhibits very good anytime performance, but an error bound on the results has not been established. We reformulate the algorithm and derive both online and offline error bounds for approximate solutions. Moreover, we propose an effective way to automatically reduce the complexity of the interaction. Our experiments show that this is a promising approach to solve a broad class of decentralized decision problems. The general formulation used by the algorithm makes it both easy to implement and widely applicable to a variety of other AI problems. |
Szer, Daniel; Charpillet, Francois; Zilberstein, Shlomo MAA*: A Heuristic Search Algorithm for Solving Decentralized POMDPs Conference Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence (UAI), Edinburgh, Scotland, 2005. @conference{SZ:SCZuai05, We present multi-agent A* (MAA*), the first complete and optimal heuristic search algorithm for solving decentralized partially-observable Markov decision problems (DEC- POMDPs) with finite horizon. The algorithm is suitable for computing optimal plans for a cooperative group of agents that operate in a stochastic environment such as multi-robot coordination, network traffic control, or distributed resource allocation. Solving such problems effectively is a major challenge in the area of planning under uncertainty. Our solution is based on a synthesis of classical heuristic search and decentralized control theory. Experimental results show that MAA* has significant advantages. We introduce an anytime variant of MAA* and conclude with a discussion of promising extensions such as an approach to solving infinite-horizon problems. |
Allen, Martin; Goldman, Claudia V; Zilberstein, Shlomo Language Learning in Multi-Agent Systems Conference Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI), Edinburgh, Scotland, 2005. @conference{SZ:AGZijcai05, We present the problem of learning to communicate in decentralized and stochastic environments, analyzing it formally in a decision-theoretic context and illustrating the concept experimentally. Our approach allows agents to converge upon coordinated communication and action over time. |
Bernstein, Daniel S; Hansen, Eric A; Zilberstein, Shlomo Bounded Policy Iteration for Decentralized POMDPs Conference Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI), Edinburgh, Scotland, 2005. @conference{SZ:BHZijcai05, We present a bounded policy iteration algorithm for infinite-horizon decentralized POMDPs. Policies are represented as joint stochastic finite-state controllers, which consist of a local controller for each agent. We also let a joint controller include a correlation device that allows the agents to correlate their behavior without exchanging information during execution, and show that this leads to improved performance. The algorithm uses a fixed amount of memory, and each iteration is guaranteed to produce a controller with value at least as high as the previous one for all possible initial state distributions. For the case of a single agent, the algorithm reduces to Poupart and Boutilier's bounded policy iteration for POMDPs. |
Generalized Planning
Mahmud, Saaduddin; Nashed, Samer B.; Goldman, Claudia V.; Zilberstein, Shlomo Estimating Causal Responsibility for Explaining Autonomous Behavior Book Section In: Calvaresi, Davide (Ed.): International Workshop on Explainable and Transparent AI and Multi-Agent Systems (EXTRAAMAS), pp. 78–94, Springer, 2023. @incollection{SZ:MNGZextraamas23, There has been growing interest in causal explanations of stochastic, sequential decision-making systems. Structural causal models and causal reasoning offer several theoretical benefits when exact inference can be applied. Furthermore, users overwhelmingly prefer the resulting causal explanations over other state-of-the-art systems. In this work, we focus on one such method, MeanRESP, and its approximate versions that drastically reduce compute load and assign a responsibility score to each variable, which helps identify smaller sets of causes to be used as explanations. However, this method, and its approximate versions in particular, lack deeper theoretical analysis and broader empirical tests. To address these shortcomings, we provide three primary contributions. First, we offer several theoretical insights on the sample complexity and error rate of approximate MeanRESP. Second, we discuss several automated metrics for comparing explanations generated from approximate methods to those generated via exact methods. While we recognize the significance of user studies as the gold standard for evaluating explanations, our aim is to leverage the proposed metrics to systematically compare explanation-generation methods along important quantitative dimensions. Finally, we provide a more detailed discussion of MeanRESP and how its output under different definitions of responsibility compares to existing widely adopted methods that use Shapley values. |
Parr, Shane; Khatri, Ishan; Svegliato, Justin; Zilberstein, Shlomo Agent-Aware State Estimation in Autonomous Vehicles Conference Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 2021. @conference{SZ:PKSZiros21, Autonomous systems often operate in environments where the behavior of multiple agents is coordinated by a shared global state. Reliable estimation of the global state is thus critical for successfully operating in a multi-agent setting. We introduce agent-aware state estimation--a framework for calculating indirect estimations of state given observations of the behavior of other agents in the environment. We also introduce transition-independent agent-aware state estimation--a tractable class of agent-aware state estimation--and show that it allows the speed of inference to scale linearly with the number of agents in the environment. As an example, we model traffic light classification in instances of complete loss of direct observation. By taking into account observations of vehicular behavior from multiple directions of traffic, our approach exhibits accuracy higher than that of existing traffic light-only HMM methods on a real-world autonomous vehicle data set under a variety of simulated occlusion scenarios. |
Wu, Feng; Zilberstein, Shlomo; Jennings, Nicholas R Multi-Agent Planning with High-Level Human Guidance Conference Proceedings of Principles and Practice of Multi-Agent Systems (PRIMA), 2020. @conference{SZ:WZJprima20, Planning and coordination of multiple agents in the presence of uncertainty and noisy sensors is extremely hard. A human operator who observes a multi-agent team can provide valuable guidance to the team based on her superior ability to interpret observations and assess the overall situation. We propose an extension of decentralized POMDPs that allows such human guidance to be factored into the planning and execution processes. Human guidance in our framework consists of intuitive high-level commands that the agents must translate into a suitable joint plan that is sensitive to what they know from local observations. The result is a framework that allows multi-agent systems to benefit from the complex strategic thinking of a human supervising them. We evaluate this approach on several common benchmark problems and show that it can lead to dramatic improvement in performance. |
Wu, Feng; Zilberstein, Shlomo; Jennings, Nicholas R Stochastic Multi-agent Planning with Partial State Models Conference Proceedings of the First International Conference on Distributed Artificial Intelligence (DAI), Beijing, China, 2019. @conference{SZ:WZJdai19, People who observe a multi-agent team can often provide valuable information to the agents based on their superior cognitive abilities to interpret sequences of observations and assess the overall situation. The knowledge they possess is often difficult to be fully represented using a formal model such as DEC-POMDP. To deal with this, we propose an extension of the DEC-POMDP that allows states to be partially specified and benefit from expert knowledge, while preserving the partial observability and decentralized operation of the agents. In particular, we present an algorithm for computing policies based on history samples that include human labeled data in the form of reward reshaping. We also consider ways to minimize the burden on human experts during the labeling phase. The results offer the first approach to incorporating human knowledge in such complex multi-agent settings. We demonstrate the benefits of our approach using a disaster recovery scenario, comparing it to several baseline approaches. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Privacy-Preserving Policy Iteration for Decentralized POMDPs Conference Proceedings of the 32nd Conference on Artificial Intelligence (AAAI), New Orleans, Louisiana, 2018. @conference{SZ:WZCaaai18, We propose the first privacy-preserving approach to address the privacy issues that arise in multi-agent planning problems modeled as a Dec-POMDP. Our solution is a distributed message-passing algorithm based on trials, where the agents' policies are optimized using the cross-entropy method. In our algorithm, the agents' private information is protected using a public-key homomorphic cryptosystem. We prove the correctness of our algorithm and analyze its complexity in terms of message passing and encryption/decryption operations. Furthermore, we analyze several privacy aspects of our algorithm and show that it can preserve the agent privacy of non-neighbors, model privacy, and decision privacy. Our experimental results on several common Dec-POMDP bench- mark problems confirm the effectiveness of our approach. |
Wray, Kyle Hollins; Kumar, Akshat; Zilberstein, Shlomo Integrated Cooperation and Competition in Multi-Agent Decision-Making Conference Proceedings of the 32nd Conference on Artificial Intelligence (AAAI), New Orleans, Louisiana, 2018. @conference{SZ:WKZaaai18, Observing that many real-world sequential decision problems are not purely cooperative or purely competitive, we propose a new model--cooperative-competitive process (CCP)--that can simultaneously encapsulate both cooperation and competition. First, we discuss how the CCP model bridges the gap between cooperative and competitive models. Next, we investigate a specific class of group-dominant CCPs, in which agents cooperate to achieve a common goal as their primary objective, while also pursuing individual goals as a secondary objective. We provide an approximate solution for this class of problems that leverages stochastic finite-state controllers. The model is grounded in two multi-robot meeting and box-pushing domains that are implemented in simulation and demonstrated on two real robots. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Multi-Agent Planning with Baseline Regret Minimization Conference Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI), 2017. @conference{SZ:WZCijcai17, We propose a novel baseline regret minimization algorithm for multi-agent planning problems modeled as finite-horizon decentralized POMDPs. It guarantees to produce a policy that is provably at least as good as a given baseline policy. We also propose an iterative belief generation algorithm to efficiently minimize the baseline regret, which only requires necessary iterations so as to converge to the policy with minimum baseline regret. Experimental results on common benchmark problems confirm the benefits of the algorithm compared with the state-of-the-art approaches. |
Kumar, Akshat; Mostafa, Hala; Zilberstein, Shlomo Dual Formulations for Optimizing Dec-POMDP Controllers Conference Proceedings of the 26th International Conference on Automated Planning and Scheduling (ICAPS), London, UK, 2016. @conference{SZ:KMZicaps16, Decentralized POMDP is an expressive model for multiagent planning. Finite-state controllers (FSCs)--often used to represent policies for infinite-horizon problems---offer a compact, simple-to-execute policy representation. We exploit novel connections between optimizing decentralized FSCs and the dual linear program for MDPs. Consequently, we describe a dual mixed integer linear program (MIP) for optimizing deterministic FSCs. We exploit the Dec-POMDP structure to devise a compact MIP and formulate constraints that result in policies executable in partially-observable decentralized settings. We show analytically that the dual formulation can also be exploited within the expectation maximization (EM) framework to optimize stochastic FSCs. The resulting EM algorithm can be implemented by solving a sequence of linear programs, without requiring expensive message passing over the Dec-POMDP DBN. We also present an efficient technique for policy improvement based on a weighted entropy measure. Compared with state-of-the-art FSC methods, our approach offers over an order-of-magnitude speedup, while producing similar or better solutions. |
Kumar, Akshat; Zilberstein, Shlomo; Toussaint, Marc Probabilistic Inference Techniques for Scalable Multiagent Decision Making Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 53, pp. 223–270, 2015. @article{SZ:KZTjair15, Decentralized POMDPs provide an expressive framework for multiagent sequential decision making. However, the complexity of these models -- NEXP-Complete even for two agents -- has limited their scalability. We present a promising new class of approximation algorithms by developing novel connections between multiagent planning and machine learning. We show how the multiagent planning problem can be reformulated as inference in a mixture of dynamic Bayesian networks (DBNs). This planning-as-inference approach paves the way for the application of efficient inference techniques in DBNs to multiagent decision making. To further improve scalability, we identify certain conditions that are sufficient to extend the approach to multiagent systems with dozens of agents. Specifically, we show that the necessary inference within the expectation-maximization framework can be decomposed into processes that often involve a small subset of agents, thereby facilitating scalability. We further show that a number of existing multiagent planning models satisfy these conditions. Experiments on large planning benchmarks confirm the benefits of our approach in terms of runtime and scalability with respect to existing techniques. |
Nguyen, Duc Thien; Yeoh, William; Lau, Hoong Chuin; Zilberstein, Shlomo; Zhang, Chongjie Decentralized Multi-Agent Reinforcement Learning in Average-Reward Dynamic DCOPs Conference Proceedings of the 28th Conference on Artificial Intelligence (AAAI), Quebec City, Canada, 2014. @conference{SZ:NYLZZaaai14, Researchers have introduced the Dynamic Distributed Constraint Optimization Problem (Dynamic DCOP) formulation to model dynamically changing multi-agent coordination problems, where a dynamic DCOP is a sequence of (static canonical) DCOPs, each partially different from the DCOP preceding it. Existing work typically assumes that the problem in each time step is decoupled from the problems in other time steps, which might not hold in some applications. Therefore, in this paper, we make the following contributions: (i) We introduce a new model, called Markovian Dynamic DCOPs (MD-DCOPs), where the DCOP in the next time step is a function of the value assignments in the current time step; (ii) We introduce two distributed reinforcement learning algorithms, the Distributed RVI Q-learning algorithm and the Distributed R-learning algorithm, that balance exploration and exploitation to solve MD-DCOPs in an online manner; and (iii) We empirically evaluate them against an existing multiarm bandit DCOP algorithm on dynamic DCOPs. |
Brafman, Ronen I; Shani, Guy; Zilberstein, Shlomo Qualitative Planning under Partial Observability in Multi-Agent Domains Conference Proceedings of the 27th Conference on Artificial Intelligence (AAAI), Bellevue, Washington, 2013. @conference{SZ:BSZaaai13, Decentralized POMDPs (Dec-POMDPs) provide a rich, attractive model for planning under uncertainty and partial observability in cooperative multi-agent domains with a growing body of research. In this paper we formulate a qualitative, propositional model for multi-agent planning under uncertainty with partial observability, which we call Qualitative Dec-POMDP (QDec-POMDP). We show that the worst-case complexity of planning in QDec-POMDPs is similar to that of Dec-POMDPs. Still, because the model is more "classical" in nature, it is more compact and easier to specify. Furthermore, it eases the adaptation of methods used in classical and contingent planning to solve problems that challenge current Dec-POMDPs solvers. In particular, in this paper we describe a method based on compilation to classical planning, which handles multi-agent planning problems significantly larger than those handled by current Dec-POMDP algorithms. |
Wu, Feng; Zilberstein, Shlomo; Jennings, Nicholas R Monte-Carlo Expectation Maximization for Decentralized POMDPs Conference Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI), Beijing, China, 2013. @conference{SZ:WZJijcai13, We address two significant drawbacks of state-of-the-art solvers of decentralized POMDPs (DEC-POMDPs): the reliance on complete knowledge of the model and limited scalability as the complexity of the domain grows. We extend a recently proposed approach for solving DEC-POMDPs via a reduction to the maximum likelihood problem, which in turn can be solved using EM. We introduce a model-free version of this approach that employs Monte-Carlo EM (MCEM). While a naive implementation of MCEM is inadequate in multi-agent settings, we introduce several improvements in sampling that produce high-quality results on a variety of DEC-POMDP benchmarks, including large problems with thousands of agents. |
Yeoh, William; Kumar, Akshat; Zilberstein, Shlomo Automated Generation of Interaction Graphs for Value-Factored Dec-POMDPs Conference Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI), Beijing, China, 2013. @conference{SZ:YKZijcai13, The Decentralized Partially Observable Markov Decision Process (Dec-POMDP) is a powerful model for multiagent planning under uncertainty, but its applicability is hindered by its high complexity -- solving Dec-POMDPs optimally is NEXP-hard. Recently, Kumar et al. introduced the Value Factorization (VF) framework, which exploits decomposable value functions that can be factored into subfunctions. This framework has been shown to be a generalization of several models that leverage sparse agent interactions such as TI-Dec-MDPs, ND-POMDPs and TD-POMDPs. Existing algorithms for these models assume that the interaction graph of the problem is given. In this paper, we introduce three algorithms to automatically generate interaction graphs for models within the VF framework and establish lower and upper bounds on the expected reward of an optimal joint policy. We illustrate experimentally the benefits of these techniques for sensor placement in a decentralized tracking application. |
Durfee, Edmund; Zilberstein, Shlomo Multiagent Planning, Control, and Execution Book Section In: Weiss, G (Ed.): Multiagent Systems, Second Edition, pp. 485–546, MIT Press, Cambridge, MA, USA, 2013. @incollection{SZ:DZmultiagent13, |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Online Planning for Multi-Agent Systems with Bounded Communication Journal Article In: Artificial Intelligence (AIJ), vol. 175, no. 2, pp. 487–511, 2011. @article{SZ:WZCaij11, We propose an online algorithm for planning under uncertainty in multi-agent settings modeled as DEC-POMDPs. The algorithm helps overcome the high computational complexity of solving such problems offline. The key challenges in decentralized operation are to maintain coordinated behavior with little or no communication and, when communication is allowed, to optimize value with minimal communication. The algorithm addresses these challenges by generating identical conditional plans based on common knowledge and communicating only when history inconsistency is detected, allowing communication to be postponed when necessary. To be suitable for online operation, the algorithm computes good local policies using a new and fast local search method implemented using linear programming. Moreover, it bounds the amount of memory used at each step and can be applied to problems with arbitrary horizons. The experimental results confirm that the algorithm can solve problems that are too large for the best existing offline planning algorithms and it outperforms the best online method, producing much higher value with much less communication in most cases. The algorithm also proves to be effective when the communication channel is imperfect (periodically unavailable). These results contribute to the scalability of decision-theoretic planning in multi-agent settings. |
Kumar, Akshat; Zilberstein, Shlomo Message-Passing Algorithms for Large Structured Decentralized POMDPs Conference Proceedings of the 10th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Taipei, Taiwan, 2011. @conference{SZ:KZaamas11, Anytime algorithms allow a system to trade solution quality for computation time. In previous work, monitoring techniques have been developed to allow agents to stop the computation at the "right" time so as to optimize a given time-dependent utility function. However, these results apply only to the single-agent case. In this paper we analyze the problems that arise when several agents solve components of a larger problem, each using an anytime algorithm. Monitoring in this case is more challenging as each agent is uncertain about the progress made so far by the others. We develop a formal framework for decentralized monitoring, establish the complexity of several interesting variants of the problem, and propose solution techniques for each one. Finally, we show that the framework can be applied to decentralized flow and planning problems. |
Kumar, Akshat; Zilberstein, Shlomo; Toussaint, Marc Scalable Multiagent Planning Using Probabilistic Inference Conference Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI), Barcelona, Spain, 2011. @conference{SZ:KZTijcai11, Multiagent planning has seen much progress with the development of formal models such as Dec-POMDPs. However, the complexity of these models -- NEXP-Complete even for two agents -- has limited scalability. We identify certain mild conditions that are sufficient to make multiagent planning amenable to a scalable approximation w.r.t. the number of agents. This is achieved by constructing a graphical model in which likelihood maximization is equivalent to plan optimization. Using the Expectation-Maximization framework for likelihood maximization, we show that the necessary inference can be decomposed into processes that often involve a small subset of agents, thereby facilitating scalability. We derive a global update rule that combines these local inferences to monotonically increase the overall solution quality. Experiments on a large multiagent planning benchmark confirm the benefits of the new approach in terms of runtime and scalability. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Online Planning for Ad Hoc Autonomous Agent Teams Conference Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI), Barcelona, Spain, 2011. @conference{SZ:WZCijcai11, We propose a novel online planning algorithm for ad hoc team settings -- challenging situations in which an agent must collaborate with unknown teammates without prior coordination. Our approach is based on constructing and solving a series of stage games, and then using biased adaptive play to choose actions. The utility function in each stage game is estimated via Monte-Carlo tree search using the UCT algorithm. We establish analytically the convergence of the algorithm and show that it performs well in a variety of ad hoc team domains. |
Amato, Christopher; Bernstein, Daniel S; Zilberstein, Shlomo Optimizing Fixed-Size Stochastic Controllers for POMDPs and Decentralized POMDPs Journal Article In: Autonomous Agents and Multi-Agent Systems (JAAMAS), vol. 21, no. 3, pp. 293–320, 2010. @article{SZ:ABZjaamas10, Coordination of distributed agents is required for problems arising in many areas, including multi-robot systems, networking and e-commerce. As a formal framework for such problems, we use the decentralized partially observable Markov decision process (DEC-POMDP). Though much work has been done on optimal dynamic programming algorithms for the single-agent version of the problem, optimal algorithms for the multiagent case have been elusive. The main contribution of this paper is an optimal policy iteration algorithm for solving DEC-POMDPs. The algorithm uses stochastic finite-state controllers to represent policies. The solution can include a correlation device, which allows agents to correlate their actions without communicating. This approach alternates between expanding the controller and performing value-preserving transformations, which modify the controller without sacrificing value. We present two Efficient value-preserving transformations: one can reduce the size of the controller and the other can improve its value while keeping the size fixed. Empirical results demonstrate the usefulness of value-preserving transformations in increasing value while keeping controller size to a minimum. To broaden the applicability of the approach, we also present a heuristic version of the policy iteration algorithm, which sacrifices convergence to optimality. This algorithm further reduces the size of the controllers at each step by assuming that probability distributions over the other agents' actions are known. While this assumption may not hold in general, it helps produce higher quality solutions in our test problems. |
Kumar, Akshat; Zilberstein, Shlomo Point-Based Backup for Decentralized POMDPs: Complexity and New Algorithms Conference Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Toronto, Canada, 2010. @conference{SZ:KZaamas10, Decentralized POMDPs provide an expressive framework for sequential multi-agent decision making. Despite their high complexity, there has been significant progress in scaling up existing algorithms, largely due to the use of point-based methods. Performing point-based backup is a fundamental operation in state-of-the-art algorithms. We show that even a single backup step in the multi-agent setting is NP-Complete. Despite this negative worst-case result, we present an efficient and scalable optimal algorithm as well as a principled approximation scheme. The optimal algorithm exploits recent advances in the weighted CSP literature to overcome the complexity of the backup operation. The polytime approximation scheme provides a constant factor approximation guarantee based on the number of belief points. In experiments on standard domains, the optimal approach provides significant speedup (up to 2 orders of magnitude) over the previous best optimal algorithm and is able to increase the number of belief points by more than a factor of 3. The approximation scheme also works well in practice, providing near-optimal solutions to the backup problem. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Point-Based Policy Generation for Decentralized POMDPs Conference Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Toronto, Canada, 2010. @conference{SZ:WZCaamas10, Memory-bounded techniques have shown great promise in solving complex multi-agent planning problems modeled as DEC-POMDPs. Much of the performance gains can be attributed to pruning techniques that alleviate the complexity of the exhaustive backup step of the original MBDP algorithm. Despite these improvements, state-of-the-art algorithms can still handle a relative small pool of candidate policies, which limits the quality of the solution in some benchmark problems. We present a new algorithm, Point-Based Policy Generation, which avoids altogether searching the entire joint policy space. The key observation is that the best joint policy for each reachable belief state can be constructed directly, instead of producing first a large set of candidates. We also provide an efficient approximate implementation of this operation. The experimental results show that our solution technique improves the performance significantly in terms of both runtime and solution quality. |
Kumar, Akshat; Zilberstein, Shlomo Anytime Planning for Decentralized POMDPs using Expectation Maximization Conference Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI), Catalina Island, California, 2010. @conference{SZ:KZuai10, Decentralized POMDPs provide an expressive framework for multi-agent sequential decision making. While finite-horizon DEC-POMDPs have enjoyed significant success, progress remains slow for the infinite-horizon case mainly due to the inherent complexity of optimizing stochastic controllers representing agent policies. We present a promising new class of algorithms for the infinite-horizon case, which recasts the optimization problem as inference in a mixture of DBNs. An attractive feature of this approach is the straightforward adoption of existing inference techniques in DBNs for solving DEC-POMDPs and supporting richer representations such as factored or continuous states and actions. We also derive the Expectation Maximization (EM) algorithm to optimize the joint policy represented as DBNs. Experiments on benchmark domains show that EM compares favorably against the state-of-the-art solvers. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Rollout Sampling Policy Iteration for Decentralized POMDPs Conference Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI), Catalina Island, California, 2010. @conference{SZ:WZCuai10, We present decentralized rollout sampling policy iteration (DecRSPI)--a new algorithm for multiagent decision problems formalized as DEC-POMDPs. DecRSPI is designed to improve scalability and tackle problems that lack an explicit model. The algorithm uses Monte-Carlo methods to generate a sample of reachable belief states. Then it computes a joint policy for each belief state based on the rollout estimations. A new policy representation allows us to represent solutions compactly. The key benefits of the algorithm are its linear time complexity over the number of agents, its bounded memory usage and good solution quality. It can solve larger problems that are intractable for existing planning algorithms. Experimental results confirm the effectiveness and scalability of the approach. |
Amato, Christopher; Bonet, Blai; Zilberstein, Shlomo Finite-State Controllers Based on Mealy Machines for Centralized and Decentralized POMDPs Conference Proceedings of the 24th Conference on Artificial Intelligence (AAAI), Atlanta, Georgia, 2010. @conference{SZ:ABZaaai10, Existing controller-based approaches for centralized and decentralized POMDPs are based on automata with output known as Moore machines. In this paper, we show that several advantages can be gained by utilizing another type of automata, the Mealy machine. Mealy machines are more powerful than Moore machines, provide a richer structure that can be exploited by solution methods, and can be easily incorporated into current controller-based approaches. To demonstrate this, we adapted some existing controller-based algorithms to use Mealy machines and obtained results on a set of benchmark domains. The Mealy-based approach always outperformed the Moore-based approach and often outperformed the state-of-the-art algorithms for both centralized and decentralized POMDPs. These findings provide fresh and general insights for the improvement of existing algorithms and the development of new ones. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Trial-Based Dynamic Programming for Multi-Agent Planning Conference Proceedings of the 24th Conference on Artificial Intelligence (AAAI), Atlanta, Georgia, 2010. @conference{SZ:WZCaaai10, Trial-based approaches offer an efficient way to solve single-agent MDPs and POMDPs. These approaches allow agents to focus their computations on regions of the environment they encounter during the trials, leading to significant computational savings. We present a novel trial-based dynamic programming (TBDP) algorithm for DEC-POMDPs that extends these benefits to multi-agent settings. The algorithm uses trial-based methods for both belief generation and policy evaluation. Policy improvement is implemented efficiently using linear programming and a sub-policy reuse technique that helps bound the amount of memory. The results show that TBDP can produce significant value improvements and is much faster than the best existing planning algorithms. |
Bernstein, Daniel S; Amato, Christopher; Hansen, Eric A; Zilberstein, Shlomo Policy Iteration for Decentralized Control of Markov Decision Processes Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 34, pp. 89–132, 2009. @article{SZ:BAHZjair09, Coordination of distributed agents is required for problems arising in many areas, including multi-robot systems, networking and e-commerce. As a formal framework for such problems, we use the decentralized partially observable Markov decision process (DEC-POMDP). Though much work has been done on optimal dynamic programming algorithms for the single-agent version of the problem, optimal algorithms for the multiagent case have been elusive. The main contribution of this paper is an optimal policy iteration algorithm for solving DEC-POMDPs. The algorithm uses stochastic finite-state controllers to represent policies. The solution can include a correlation device, which allows agents to correlate their actions without communicating. This approach alternates between expanding the controller and performing value-preserving transformations, which modify the controller without sacrificing value. We present two Efficient value-preserving transformations: one can reduce the size of the controller and the other can improve its value while keeping the size fixed. Empirical results demonstrate the usefulness of value-preserving transformations in increasing value while keeping controller size to a minimum. To broaden the applicability of the approach, we also present a heuristic version of the policy iteration algorithm, which sacrifices convergence to optimality. This algorithm further reduces the size of the controllers at each step by assuming that probability distributions over the other agents' actions are known. While this assumption may not hold in general, it helps produce higher quality solutions in our test problems. |
Petrik, Marek; Zilberstein, Shlomo A Bilinear Programming Approach for Multiagent Planning Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 35, pp. 235–274, 2009. @article{SZ:PZjair09, Multiagent planning and coordination problems are common and known to be computationally hard. We show that a wide range of two-agent problems can be formulated as bilinear programs. We present a successive approximation algorithm that significantly outperforms the coverage set algorithm, which is the state-of-the-art method for this class of multiagent problems. Because the algorithm is formulated for bilinear programs, it is more general and simpler to implement. The new algorithm can be terminated at any time and--unlike the coverage set algorithm--it facilitates the derivation of a useful online performance bound. It is also much more efficient, on average reducing the computation time of the optimal solution by about four orders of magnitude. Finally, we introduce an automatic dimensionality reduction method that improves the effectiveness of the algorithm, extending its applicability to new domains and providing a new way to analyze a subclass of bilinear programs. |
Becker, Raphen; Carlin, Alan; Lesser, Victor; Zilberstein, Shlomo Analyzing Myopic Approaches for Multi-Agent Communication Journal Article In: Computational Intelligence, vol. 25, no. 1, pp. 31–50, 2009. @article{SZ:BCLZci09, Choosing when to communicate is a fundamental problem in multi-agent systems. This problem becomes particularly challenging when communication is constrained and each agent has different partial information about the overall situation. We take a decision-theoretic approach to this problem that balances the benefits of communication against the costs. Although computing the exact value of communication is intractable, it can be estimated using a standard myopic assumption--that communication is only possible at the present time. We examine specific situations in which this assumption leads to poor performance and demonstrate an alternative approach that relaxes the assumption and improves performance. The results provide an effective method for value-driven communication policies in multi-agent systems. |
Amato, Christopher; Zilberstein, Shlomo Achieving Goals in Decentralized POMDPs Conference Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Budapest, Hungary, 2009. @conference{ASZ:Zaamas09, Coordination of multiple agents under uncertainty in the decentralized POMDP model is known to be NEXP-complete, even when the agents have a joint set of goals. Nevertheless, we show that the existence of goals can help develop effective planning algorithms. We examine an approach to model these problems as indefinite-horizon decentralized POMDPs, suitable for many practical problems that terminate after some unspecified number of steps. Our algorithm for solving these problems is optimal under some common assumptions--that terminal actions exist for each agent and rewards for non-terminal actions are negative. We also propose an infinite-horizon approximation method that allows us to relax these assumptions while maintaining goal conditions. An optimality bound is developed for this sample-based approach and experimental results show that it is able to exploit the goal structure effectively. Compared with the state-of-the-art, our approach can solve larger problems and produce significantly better solutions. |
Kumar, Akshat; Zilberstein, Shlomo Constraint-Based Dynamic Programming for Decentralized POMDPs with Structured Interactions Conference Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Budapest, Hungary, 2009. @conference{SZ:KZaamas09, Decentralized partially observable MDPs (DEC-POMDPs) provide a rich framework for modeling decision making by a team of agents. Despite rapid progress in this area, the limited scalability of solution techniques has restricted the applicability of the model. To overcome this computational barrier, research has focused on restricted classes of DEC-POMDPs, which are easier to solve yet rich enough to capture many practical problems. We present CBDP, an efficient and scalable point-based dynamic programming algorithm for one such model called ND-POMDP (Network Distributed POMDP). Specifically, CBDP provides magnitudes of speedup in the policy computation and generates better quality solution for all test instances. It has linear complexity in the number of agents and horizon length. Furthermore, the complexity per horizon for the examined class of problems is exponential only in a small parameter that depends upon the interaction among the agents, achieving significant scalability for large, loosely coupled multi-agent systems. The efficiency of CBDP lies in exploiting the structure of interactions using constraint networks. These results extend significantly the effectiveness of decision-theoretic planning in multi-agent settings. |
Kumar, Akshat; Zilberstein, Shlomo Dynamic Programming Approximations for Partially Observable Stochastic Games Conference Proceedings of the 22nd International FLAIRS Conference, Sanibel Island, Florida, 2009. @conference{SZ:KZflairs09, Partially observable stochastic games (POSGs) provide a rich mathematical framework for planning under uncertainty by a group of agents. However, this modeling advantage comes with a price, namely a high computational cost. Solving POSGs optimally quickly becomes intractable after a few decision cycles. Our main contribution is to provide bounded approximation techniques, which enable us to scale POSG algorithms by several orders of magnitude. We study both the POSG model and its cooperative counterpart, DEC-POMDP. Experiments on a number of problems confirm the scalability of our approach while still providing useful policies. |
Kumar, Akshat; Zilberstein, Shlomo Event-Detecting Multi-Agent MDPs: Complexity and Constant-Factor Approximation Conference Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI), Pasadena, California, 2009. @conference{SZ:KZijcai09, Planning under uncertainty for multiple agents has grown rapidly with the development of formal models such as multi-agent MDPs and decentralized MDPs. But despite their richness, the applicability of these models remains limited due to their computational complexity. We present the class of event-detecting multi-agent MDPs (eMMDPs), designed to detect multiple mobile targets by a team of sensor agents. We show that eMMDPs are NP-Hard and present a scalable 2-approximation algorithm for solving them using matroid theory and constraint optimization. The complexity of the algorithm is linear in the state-space and number of agents, quadratic in the horizon, and exponential only in a small parameter that depends on the interaction among the agents. Despite the worst-case approximation ratio of 2, experimental results show that the algorithm produces near-optimal policies for a range of test problems. |
Amato, Christopher; Dibangoye, Jilles Steeve; Zilberstein, Shlomo Incremental Policy Generation for Finite-Horizon DEC-POMDPs Conference Proceedings of the 19th International Conference on Automated Planning and Scheduling (ICAPS), Thessaloniki, Greece, 2009. @conference{SZ:ADZicaps09, Decentralized partially observable MDPs (DEC-POMDPs) provide a rich framework for modeling decision making by a team of agents. Despite rapid progress in this area, the limited scalability of solution techniques has restricted the applicability of the model. To overcome this computational barrier, research has focused on restricted classes of DEC-POMDPs, which are easier to solve yet rich enough to capture many practical problems. We present CBDP, an efficient and scalable point-based dynamic programming algorithm for one such model called ND-POMDP (Network Distributed POMDP). Specifically, CBDP provides magnitudes of speedup in the policy computation and generates better quality solution for all test instances. It has linear complexity in the number of agents and horizon length. Furthermore, the complexity per horizon for the examined class of problems is exponential only in a small parameter that depends upon the interaction among the agents, achieving significant scalability for large, loosely coupled multi-agent systems. The efficiency of CBDP lies in exploiting the structure of interactions using constraint networks. These results extend significantly the effectiveness of decision-theoretic planning in multi-agent settings. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Multi-Agent Online Planning with Communication Conference Proceedings of the 19th International Conference on Automated Planning and Scheduling (ICAPS), Thessaloniki, Greece, 2009. @conference{SZ:WZCicaps09, We propose an online algorithm for planning under uncertainty in multi-agent settings modeled as DEC-POMDPs. The algorithm helps overcome the high computational complexity of solving such problems off-line. The key challenge is to produce coordinated behavior using little or no communication. When communication is allowed but constrained, the challenge is to produce high value with minimal communication. The algorithm addresses these challenges by communicating only when history inconsistency is detected, allowing communication to be postponed if necessary. Moreover, it bounds the memory usage at each step and can be applied to problems with arbitrary horizons. The experimental results confirm that the algorithm can solve problems that are too large for the best existing off-line planning algorithms and it outperforms the best online method, producing higher value with much less communication in most cases. |
Allen, Martin; Zilberstein, Shlomo Complexity of Decentralized Control: Special Cases Conference Proceedings of the 23rd Neural Information Processing Systems Conference (NIPS), Vancouver, British Columbia, Canada, 2009. @conference{SZ:AZnips09, The worst-case complexity of general decentralized POMDPs, which are equivalent to partially observable stochastic games (POSGs) is very high, both for the cooperative and competitive cases. Some reductions in complexity have been achieved by exploiting independence relations in some models. We show that these results are somewhat limited: when these independence assumptions are relaxed in very small ways, complexity returns to that of the general case. |
Goldman, Claudia V; Zilberstein, Shlomo Communication-Based Decomposition Mechanisms for Decentralized MDPs Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 32, pp. 169–202, 2008. @article{SZ:GZjair08, Multi-agent planning in stochastic environments can be framed formally as a decentralized Markov decision problem. Many real-life distributed problems that arise in manufacturing, multi-robot coordination and information gathering scenarios can be formalized using this framework. However, finding the optimal solution in the general case is hard, limiting the applicability of recently developed algorithms. This paper provides a practical approach for solving decentralized control problems when communication among the decision makers is possible, but costly. We develop the notion of communication-based mechanism that allows us to decompose a decentralized MDP into multiple single-agent problems. In this framework, referred to as decentralized semi-Markov decision process with direct communication (Dec-SMDP-Com), agents operate separately between communications. We show that finding an optimal mechanism is equivalent to solving optimally a Dec-SMDP-Com. We also provide a heuristic search algorithm that converges on the optimal decomposition. Restricting the decomposition to some specific types of local behaviors reduces significantly the complexity of planning. In particular, we present a polynomial-time algorithm for the case in which individual agents perform goal-oriented behaviors between communications. The paper concludes with an additional tractable algorithm that enables the introduction of human knowledge, thereby reducing the overall problem to finding the best time to communicate. Empirical results show that these approaches provide good approximate solutions. |
Seuken, Sven; Zilberstein, Shlomo Formal Models and Algorithms for Decentralized Decision Making under Uncertainty Journal Article In: Autonomous Agents and Multi-Agent Systems (JAAMAS), vol. 17, no. 2, pp. 190–250, 2008. @article{SZ:SZjaamas08, Multi-agent planning in stochastic environments can be framed formally as a decentralized Markov decision problem. Many real-life distributed problems that arise in manufacturing, multi-robot coordination and information gathering scenarios can be formalized using this framework. However, finding the optimal solution in the general case is hard, limiting the applicability of recently developed algorithms. This paper provides a practical approach for solving decentralized control problems when communication among the decision makers is possible, but costly. We develop the notion of communication-based mechanism that allows us to decompose a decentralized MDP into multiple single-agent problems. In this framework, referred to as decentralized semi-Markov decision process with direct communication (Dec-SMDP-Com), agents operate separately between communications. We show that finding an optimal mechanism is equivalent to solving optimally a Dec-SMDP-Com. We also provide a heuristic search algorithm that converges on the optimal decomposition. Restricting the decomposition to some specific types of local behaviors reduces significantly the complexity of planning. In particular, we present a polynomial-time algorithm for the case in which individual agents perform goal-oriented behaviors between communications. The paper concludes with an additional tractable algorithm that enables the introduction of human knowledge, thereby reducing the overall problem to finding the best time to communicate. Empirical results show that these approaches provide good approximate solutions. |
Petrik, Marek; Zilberstein, Shlomo A Successive Approximation Algorithm for Coordination Problems Conference Proceedings of the 10th International Symposium on Artificial Intelligence and Mathematics (ISAIM), Ft. Lauderdale, Florida, 2008. @conference{SZ:PZisaim08, Developing scalable coordination algorithms for multi-agent systems is a hard computational challenge. One useful approach, demonstrated by the Coverage Set Algorithm (CSA), exploits structured interaction to produce significant computational gains. Empirically, CSA exhibits very good anytime performance, but an error bound on the results has not been established. We reformulate the algorithm and derive an online error bound for approximate solutions. Moreover, we propose an effective way to automatically reduce the complexity of the interaction. Our experiments show that this is a promising approach to solve a broad class of decentralized decision problems. The general formulation used by the algorithm makes it both easy to implement and widely applicable to a variety of other AI problems. |
Carlin, Alan; Zilberstein, Shlomo Value-Based Observation Compression for DEC-POMDPs Conference Proceedings of the 7th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Estoril, Portugal, 2008. @conference{SZ:CZaamas08, Representing agent policies compactly is essential for improving the scalability of multi-agent planning algorithms. In this paper, we focus on developing a pruning technique that allows us to merge certain observations within agent policies, while minimizing loss of value. This is particularly important for solving finite-horizon decentralized POMDPs, where agent policies are represented as trees, and where the size of policy trees grows exponentially with the number of observations. We introduce a value-based observation compression technique that prunes the least valuable observations while maintaining an error bound on the value lost as a result of pruning. We analyze the characteristics of this pruning strategy and show empirically that it is effective. Thus, we use compact policies to obtain significantly higher values compared with the best existing DEC-POMDP algorithm. |
Carlin, Alan; Zilberstein, Shlomo Observation Compression in DEC-POMDP Policy Trees Conference AAMAS Workshop on Multi-Agent Sequential Decision Making in Uncertain Domains (MSDM), Estoril, Portugal, 2008, (Best Paper Award). @conference{SZ:CZmsdm08, Representing agent policies compactly is essential for improving the scalability of multi-agent planning algorithms. In this paper, we focus on developing a pruning technique that allows us to merge certain observations from agent policies, while minimizing the loss of value. This is particularly important for solving finite-horizon decentralized POMDPs, where agent policies are represented as trees, and where the size of policy trees grows exponentially with the number of observations. We introduce a value-based observation compression technique that prunes the least valuable observations while maintaining an error bound on the value lost as a result of pruning. We analyze the characteristics of this pruning strategy and show empirically that it is effective. Thus, we use compact policies to obtain significantly higher values compared with the best existing DEC-POMDP algorithm. |
Amato, Christopher; Zilberstein, Shlomo What's Worth Memorizing: Attribute-based Planning for DEC-POMDPs Conference ICAPS Workshop on Multiagent Planning, Sydney, Australia, 2008. @conference{SZ:AZmasplan08, Current algorithms for decentralized partially observable Markov decision processes (DEC-POMDPs) require a large amount of memory to produce high quality plans. To combat this, existing methods optimize a set of finite-state controllers with an arbitrary amount of fixed memory. While this works well for some problems, in general, scalability and solution quality remain limited. As an alternative, we propose remembering some attributes that summarize key aspects of an agent's action and observation history. These attributes are often simple to determine, provide a well-motivated choice of controller size and focus the solution search on important components of agent histories. We show that for a range of DEC-POMDPs such attribute-based representation improves plan quality and scalability. |
Goldman, Claudia V; Allen, Martin; Zilberstein, Shlomo Learning to Communicate in a Decentralized Environment Journal Article In: Autonomous Agents and Multi-Agent Systems (JAAMAS), vol. 15, no. 1, pp. 47–90, 2007. @article{SZ:GAZjaamas07, Learning to communicate is an emerging challenge in AI research. It is known that agents interacting in decentralized, stochastic environments can benefit from exchanging information. Multi-agent planning generally assumes that agents share a common means of communication; however, in building robust distributed systems it is important to address potential miscoordination resulting from misinterpretation of messages exchanged. This paper lays foundations for studying this problem, examining its properties analytically and empirically in a decision-theoretic context. We establish a formal framework for the problem, and identify a collection of necessary and sufficient properties for decision problems that allow agents to employ probabilistic updating schemes in order to learn how to interpret what others are communicating. Solving the problem optimally is often intractable, but our approach enables agents using different languages to converge upon coordination over time. Our experimental work establishes how these methods perform when applied to problems of varying complexity. |
Seuken, Sven; Zilberstein, Shlomo Memory-Bounded Dynamic Programming for DEC-POMDPs Conference Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI), Hyderabad, India, 2007. @conference{SZ:SZijcai07, Decentralized decision making under uncertainty has been shown to be intractable when each agent has different partial information about the domain. Thus, improving the applicability and scalability of planning algorithms is an important challenge. We present the first memory-bounded dynamic programming algorithm for finite-horizon decentralized POMDPs. A set of heuristics is used to identify relevant points of the infinitely large belief space. Using these belief points, the algorithm successively selects the best joint policies for each horizon. The algorithm is extremely efficient, having linear time and space complexity with respect to the horizon length. Experimental results show that it can handle horizons that are multiple orders of magnitude larger than what was previously possible, while achieving the same or better solution quality. These results significantly increase the applicability of decentralized decision-making techniques. |
Amato, Christopher; Bernstein, Daniel S; Zilberstein, Shlomo Optimizing Memory-Bounded Controllers for Decentralized POMDPs Conference Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (UAI), Vancouver, British Columbia, 2007. @conference{SZ:ABZuai07, We present a memory-bounded optimization approach for solving infinite-horizon decentralized POMDPs. Policies for each agent are represented by stochastic finite state controllers. We formulate the problem of optimizing these policies as a nonlinear program, leveraging powerful existing nonlinear optimization techniques for solving the problem. While existing solvers only guarantee locally optimal solutions, we show that our formulation produces higher quality controllers than the state-of-the-art approach. We also incorporate a shared source of randomness in the form of a correlation device to further increase solution quality with only a limited increase in space and time. Our experimental results show that nonlinear optimization can be used to provide high quality, concise solutions to decentralized decision problems under uncertainty. |
Seuken, Sven; Zilberstein, Shlomo Improved Memory-Bounded Dynamic Programming for Decentralized POMDPs Conference Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (UAI), Vancouver, British Columbia, 2007. @conference{SZ:SZuai07, Memory-Bounded Dynamic Programming (MBDP) has proved extremely effective in solving decentralized POMDPs with large horizons. We generalize the algorithm and improve its scalability by reducing the complexity with respect to the number of observations from exponential to polynomial. We derive error bounds on solution quality with respect to this new approximation and analyze the convergence behavior. To evaluate the effectiveness of the improvements, we introduce a new, larger benchmark problem. Experimental results show that despite the high complexity of decentralized POMDPs, scalable solution techniques such as MBDP perform surprisingly well. |
Allen, Martin; Zilberstein, Shlomo Agent Influence as a Predictor of Difficulty for Decentralized Problem-Solving Conference Proceedings of the 22nd Conference on Artificial Intelligence (AAAI), Vancouver, British Columbia, 2007. @conference{SZ:AZaaai07, We study the effect of problem structure on the practical performance of optimal dynamic programming for decentralized decision problems. It is shown that restricting agent influence over problem dynamics can make the problem easier to solve. Experimental results establish that agent influence correlates with problem difficulty: as the gap between the influence of different agents grows, problems tend to become much easier to solve. The measure thus provides a general-purpose, automatic characterization of decentralized problems, identifying those for which optimal methods are more or less likely to work. Such a measure is also of possible use as a heuristic in the design of algorithms that create task decompositions and control hierarchies in order to simplify multiagent problems. |
Petrik, Marek; Zilberstein, Shlomo Anytime Coordination Using Separable Bilinear Programs Conference Proceedings of the 22nd Conference on Artificial Intelligence (AAAI), Vancouver, British Columbia, 2007. @conference{SZ:PZaaai07, Developing scalable coordination algorithms for multi-agent systems is a hard computational challenge. One useful approach, demonstrated by the Coverage Set Algorithm (CSA), exploits structured interaction to produce significant computational gains. Empirically, CSA exhibits very good anytime performance, but an error bound on the results has not been established. We reformulate the algorithm and derive both online and offline error bounds for approximate solutions. Moreover, we propose an effective way to automatically reduce the complexity of the interaction. Our experiments show that this is a promising approach to solve a broad class of decentralized decision problems. The general formulation used by the algorithm makes it both easy to implement and widely applicable to a variety of other AI problems. |
Szer, Daniel; Charpillet, Francois; Zilberstein, Shlomo MAA*: A Heuristic Search Algorithm for Solving Decentralized POMDPs Conference Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence (UAI), Edinburgh, Scotland, 2005. @conference{SZ:SCZuai05, We present multi-agent A* (MAA*), the first complete and optimal heuristic search algorithm for solving decentralized partially-observable Markov decision problems (DEC- POMDPs) with finite horizon. The algorithm is suitable for computing optimal plans for a cooperative group of agents that operate in a stochastic environment such as multi-robot coordination, network traffic control, or distributed resource allocation. Solving such problems effectively is a major challenge in the area of planning under uncertainty. Our solution is based on a synthesis of classical heuristic search and decentralized control theory. Experimental results show that MAA* has significant advantages. We introduce an anytime variant of MAA* and conclude with a discussion of promising extensions such as an approach to solving infinite-horizon problems. |
Allen, Martin; Goldman, Claudia V; Zilberstein, Shlomo Language Learning in Multi-Agent Systems Conference Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI), Edinburgh, Scotland, 2005. @conference{SZ:AGZijcai05, We present the problem of learning to communicate in decentralized and stochastic environments, analyzing it formally in a decision-theoretic context and illustrating the concept experimentally. Our approach allows agents to converge upon coordinated communication and action over time. |
Bernstein, Daniel S; Hansen, Eric A; Zilberstein, Shlomo Bounded Policy Iteration for Decentralized POMDPs Conference Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI), Edinburgh, Scotland, 2005. @conference{SZ:BHZijcai05, We present a bounded policy iteration algorithm for infinite-horizon decentralized POMDPs. Policies are represented as joint stochastic finite-state controllers, which consist of a local controller for each agent. We also let a joint controller include a correlation device that allows the agents to correlate their behavior without exchanging information during execution, and show that this leads to improved performance. The algorithm uses a fixed amount of memory, and each iteration is guaranteed to produce a controller with value at least as high as the previous one for all possible initial state distributions. For the case of a single agent, the algorithm reduces to Poupart and Boutilier's bounded policy iteration for POMDPs. |
Introspective Autonomy
Mahmud, Saaduddin; Nashed, Samer B.; Goldman, Claudia V.; Zilberstein, Shlomo Estimating Causal Responsibility for Explaining Autonomous Behavior Book Section In: Calvaresi, Davide (Ed.): International Workshop on Explainable and Transparent AI and Multi-Agent Systems (EXTRAAMAS), pp. 78–94, Springer, 2023. @incollection{SZ:MNGZextraamas23, There has been growing interest in causal explanations of stochastic, sequential decision-making systems. Structural causal models and causal reasoning offer several theoretical benefits when exact inference can be applied. Furthermore, users overwhelmingly prefer the resulting causal explanations over other state-of-the-art systems. In this work, we focus on one such method, MeanRESP, and its approximate versions that drastically reduce compute load and assign a responsibility score to each variable, which helps identify smaller sets of causes to be used as explanations. However, this method, and its approximate versions in particular, lack deeper theoretical analysis and broader empirical tests. To address these shortcomings, we provide three primary contributions. First, we offer several theoretical insights on the sample complexity and error rate of approximate MeanRESP. Second, we discuss several automated metrics for comparing explanations generated from approximate methods to those generated via exact methods. While we recognize the significance of user studies as the gold standard for evaluating explanations, our aim is to leverage the proposed metrics to systematically compare explanation-generation methods along important quantitative dimensions. Finally, we provide a more detailed discussion of MeanRESP and how its output under different definitions of responsibility compares to existing widely adopted methods that use Shapley values. |
Parr, Shane; Khatri, Ishan; Svegliato, Justin; Zilberstein, Shlomo Agent-Aware State Estimation in Autonomous Vehicles Conference Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 2021. @conference{SZ:PKSZiros21, Autonomous systems often operate in environments where the behavior of multiple agents is coordinated by a shared global state. Reliable estimation of the global state is thus critical for successfully operating in a multi-agent setting. We introduce agent-aware state estimation--a framework for calculating indirect estimations of state given observations of the behavior of other agents in the environment. We also introduce transition-independent agent-aware state estimation--a tractable class of agent-aware state estimation--and show that it allows the speed of inference to scale linearly with the number of agents in the environment. As an example, we model traffic light classification in instances of complete loss of direct observation. By taking into account observations of vehicular behavior from multiple directions of traffic, our approach exhibits accuracy higher than that of existing traffic light-only HMM methods on a real-world autonomous vehicle data set under a variety of simulated occlusion scenarios. |
Wu, Feng; Zilberstein, Shlomo; Jennings, Nicholas R Multi-Agent Planning with High-Level Human Guidance Conference Proceedings of Principles and Practice of Multi-Agent Systems (PRIMA), 2020. @conference{SZ:WZJprima20, Planning and coordination of multiple agents in the presence of uncertainty and noisy sensors is extremely hard. A human operator who observes a multi-agent team can provide valuable guidance to the team based on her superior ability to interpret observations and assess the overall situation. We propose an extension of decentralized POMDPs that allows such human guidance to be factored into the planning and execution processes. Human guidance in our framework consists of intuitive high-level commands that the agents must translate into a suitable joint plan that is sensitive to what they know from local observations. The result is a framework that allows multi-agent systems to benefit from the complex strategic thinking of a human supervising them. We evaluate this approach on several common benchmark problems and show that it can lead to dramatic improvement in performance. |
Wu, Feng; Zilberstein, Shlomo; Jennings, Nicholas R Stochastic Multi-agent Planning with Partial State Models Conference Proceedings of the First International Conference on Distributed Artificial Intelligence (DAI), Beijing, China, 2019. @conference{SZ:WZJdai19, People who observe a multi-agent team can often provide valuable information to the agents based on their superior cognitive abilities to interpret sequences of observations and assess the overall situation. The knowledge they possess is often difficult to be fully represented using a formal model such as DEC-POMDP. To deal with this, we propose an extension of the DEC-POMDP that allows states to be partially specified and benefit from expert knowledge, while preserving the partial observability and decentralized operation of the agents. In particular, we present an algorithm for computing policies based on history samples that include human labeled data in the form of reward reshaping. We also consider ways to minimize the burden on human experts during the labeling phase. The results offer the first approach to incorporating human knowledge in such complex multi-agent settings. We demonstrate the benefits of our approach using a disaster recovery scenario, comparing it to several baseline approaches. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Privacy-Preserving Policy Iteration for Decentralized POMDPs Conference Proceedings of the 32nd Conference on Artificial Intelligence (AAAI), New Orleans, Louisiana, 2018. @conference{SZ:WZCaaai18, We propose the first privacy-preserving approach to address the privacy issues that arise in multi-agent planning problems modeled as a Dec-POMDP. Our solution is a distributed message-passing algorithm based on trials, where the agents' policies are optimized using the cross-entropy method. In our algorithm, the agents' private information is protected using a public-key homomorphic cryptosystem. We prove the correctness of our algorithm and analyze its complexity in terms of message passing and encryption/decryption operations. Furthermore, we analyze several privacy aspects of our algorithm and show that it can preserve the agent privacy of non-neighbors, model privacy, and decision privacy. Our experimental results on several common Dec-POMDP bench- mark problems confirm the effectiveness of our approach. |
Wray, Kyle Hollins; Kumar, Akshat; Zilberstein, Shlomo Integrated Cooperation and Competition in Multi-Agent Decision-Making Conference Proceedings of the 32nd Conference on Artificial Intelligence (AAAI), New Orleans, Louisiana, 2018. @conference{SZ:WKZaaai18, Observing that many real-world sequential decision problems are not purely cooperative or purely competitive, we propose a new model--cooperative-competitive process (CCP)--that can simultaneously encapsulate both cooperation and competition. First, we discuss how the CCP model bridges the gap between cooperative and competitive models. Next, we investigate a specific class of group-dominant CCPs, in which agents cooperate to achieve a common goal as their primary objective, while also pursuing individual goals as a secondary objective. We provide an approximate solution for this class of problems that leverages stochastic finite-state controllers. The model is grounded in two multi-robot meeting and box-pushing domains that are implemented in simulation and demonstrated on two real robots. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Multi-Agent Planning with Baseline Regret Minimization Conference Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI), 2017. @conference{SZ:WZCijcai17, We propose a novel baseline regret minimization algorithm for multi-agent planning problems modeled as finite-horizon decentralized POMDPs. It guarantees to produce a policy that is provably at least as good as a given baseline policy. We also propose an iterative belief generation algorithm to efficiently minimize the baseline regret, which only requires necessary iterations so as to converge to the policy with minimum baseline regret. Experimental results on common benchmark problems confirm the benefits of the algorithm compared with the state-of-the-art approaches. |
Kumar, Akshat; Mostafa, Hala; Zilberstein, Shlomo Dual Formulations for Optimizing Dec-POMDP Controllers Conference Proceedings of the 26th International Conference on Automated Planning and Scheduling (ICAPS), London, UK, 2016. @conference{SZ:KMZicaps16, Decentralized POMDP is an expressive model for multiagent planning. Finite-state controllers (FSCs)--often used to represent policies for infinite-horizon problems---offer a compact, simple-to-execute policy representation. We exploit novel connections between optimizing decentralized FSCs and the dual linear program for MDPs. Consequently, we describe a dual mixed integer linear program (MIP) for optimizing deterministic FSCs. We exploit the Dec-POMDP structure to devise a compact MIP and formulate constraints that result in policies executable in partially-observable decentralized settings. We show analytically that the dual formulation can also be exploited within the expectation maximization (EM) framework to optimize stochastic FSCs. The resulting EM algorithm can be implemented by solving a sequence of linear programs, without requiring expensive message passing over the Dec-POMDP DBN. We also present an efficient technique for policy improvement based on a weighted entropy measure. Compared with state-of-the-art FSC methods, our approach offers over an order-of-magnitude speedup, while producing similar or better solutions. |
Kumar, Akshat; Zilberstein, Shlomo; Toussaint, Marc Probabilistic Inference Techniques for Scalable Multiagent Decision Making Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 53, pp. 223–270, 2015. @article{SZ:KZTjair15, Decentralized POMDPs provide an expressive framework for multiagent sequential decision making. However, the complexity of these models -- NEXP-Complete even for two agents -- has limited their scalability. We present a promising new class of approximation algorithms by developing novel connections between multiagent planning and machine learning. We show how the multiagent planning problem can be reformulated as inference in a mixture of dynamic Bayesian networks (DBNs). This planning-as-inference approach paves the way for the application of efficient inference techniques in DBNs to multiagent decision making. To further improve scalability, we identify certain conditions that are sufficient to extend the approach to multiagent systems with dozens of agents. Specifically, we show that the necessary inference within the expectation-maximization framework can be decomposed into processes that often involve a small subset of agents, thereby facilitating scalability. We further show that a number of existing multiagent planning models satisfy these conditions. Experiments on large planning benchmarks confirm the benefits of our approach in terms of runtime and scalability with respect to existing techniques. |
Nguyen, Duc Thien; Yeoh, William; Lau, Hoong Chuin; Zilberstein, Shlomo; Zhang, Chongjie Decentralized Multi-Agent Reinforcement Learning in Average-Reward Dynamic DCOPs Conference Proceedings of the 28th Conference on Artificial Intelligence (AAAI), Quebec City, Canada, 2014. @conference{SZ:NYLZZaaai14, Researchers have introduced the Dynamic Distributed Constraint Optimization Problem (Dynamic DCOP) formulation to model dynamically changing multi-agent coordination problems, where a dynamic DCOP is a sequence of (static canonical) DCOPs, each partially different from the DCOP preceding it. Existing work typically assumes that the problem in each time step is decoupled from the problems in other time steps, which might not hold in some applications. Therefore, in this paper, we make the following contributions: (i) We introduce a new model, called Markovian Dynamic DCOPs (MD-DCOPs), where the DCOP in the next time step is a function of the value assignments in the current time step; (ii) We introduce two distributed reinforcement learning algorithms, the Distributed RVI Q-learning algorithm and the Distributed R-learning algorithm, that balance exploration and exploitation to solve MD-DCOPs in an online manner; and (iii) We empirically evaluate them against an existing multiarm bandit DCOP algorithm on dynamic DCOPs. |
Brafman, Ronen I; Shani, Guy; Zilberstein, Shlomo Qualitative Planning under Partial Observability in Multi-Agent Domains Conference Proceedings of the 27th Conference on Artificial Intelligence (AAAI), Bellevue, Washington, 2013. @conference{SZ:BSZaaai13, Decentralized POMDPs (Dec-POMDPs) provide a rich, attractive model for planning under uncertainty and partial observability in cooperative multi-agent domains with a growing body of research. In this paper we formulate a qualitative, propositional model for multi-agent planning under uncertainty with partial observability, which we call Qualitative Dec-POMDP (QDec-POMDP). We show that the worst-case complexity of planning in QDec-POMDPs is similar to that of Dec-POMDPs. Still, because the model is more "classical" in nature, it is more compact and easier to specify. Furthermore, it eases the adaptation of methods used in classical and contingent planning to solve problems that challenge current Dec-POMDPs solvers. In particular, in this paper we describe a method based on compilation to classical planning, which handles multi-agent planning problems significantly larger than those handled by current Dec-POMDP algorithms. |
Wu, Feng; Zilberstein, Shlomo; Jennings, Nicholas R Monte-Carlo Expectation Maximization for Decentralized POMDPs Conference Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI), Beijing, China, 2013. @conference{SZ:WZJijcai13, We address two significant drawbacks of state-of-the-art solvers of decentralized POMDPs (DEC-POMDPs): the reliance on complete knowledge of the model and limited scalability as the complexity of the domain grows. We extend a recently proposed approach for solving DEC-POMDPs via a reduction to the maximum likelihood problem, which in turn can be solved using EM. We introduce a model-free version of this approach that employs Monte-Carlo EM (MCEM). While a naive implementation of MCEM is inadequate in multi-agent settings, we introduce several improvements in sampling that produce high-quality results on a variety of DEC-POMDP benchmarks, including large problems with thousands of agents. |
Yeoh, William; Kumar, Akshat; Zilberstein, Shlomo Automated Generation of Interaction Graphs for Value-Factored Dec-POMDPs Conference Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI), Beijing, China, 2013. @conference{SZ:YKZijcai13, The Decentralized Partially Observable Markov Decision Process (Dec-POMDP) is a powerful model for multiagent planning under uncertainty, but its applicability is hindered by its high complexity -- solving Dec-POMDPs optimally is NEXP-hard. Recently, Kumar et al. introduced the Value Factorization (VF) framework, which exploits decomposable value functions that can be factored into subfunctions. This framework has been shown to be a generalization of several models that leverage sparse agent interactions such as TI-Dec-MDPs, ND-POMDPs and TD-POMDPs. Existing algorithms for these models assume that the interaction graph of the problem is given. In this paper, we introduce three algorithms to automatically generate interaction graphs for models within the VF framework and establish lower and upper bounds on the expected reward of an optimal joint policy. We illustrate experimentally the benefits of these techniques for sensor placement in a decentralized tracking application. |
Durfee, Edmund; Zilberstein, Shlomo Multiagent Planning, Control, and Execution Book Section In: Weiss, G (Ed.): Multiagent Systems, Second Edition, pp. 485–546, MIT Press, Cambridge, MA, USA, 2013. @incollection{SZ:DZmultiagent13, |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Online Planning for Multi-Agent Systems with Bounded Communication Journal Article In: Artificial Intelligence (AIJ), vol. 175, no. 2, pp. 487–511, 2011. @article{SZ:WZCaij11, We propose an online algorithm for planning under uncertainty in multi-agent settings modeled as DEC-POMDPs. The algorithm helps overcome the high computational complexity of solving such problems offline. The key challenges in decentralized operation are to maintain coordinated behavior with little or no communication and, when communication is allowed, to optimize value with minimal communication. The algorithm addresses these challenges by generating identical conditional plans based on common knowledge and communicating only when history inconsistency is detected, allowing communication to be postponed when necessary. To be suitable for online operation, the algorithm computes good local policies using a new and fast local search method implemented using linear programming. Moreover, it bounds the amount of memory used at each step and can be applied to problems with arbitrary horizons. The experimental results confirm that the algorithm can solve problems that are too large for the best existing offline planning algorithms and it outperforms the best online method, producing much higher value with much less communication in most cases. The algorithm also proves to be effective when the communication channel is imperfect (periodically unavailable). These results contribute to the scalability of decision-theoretic planning in multi-agent settings. |
Kumar, Akshat; Zilberstein, Shlomo Message-Passing Algorithms for Large Structured Decentralized POMDPs Conference Proceedings of the 10th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Taipei, Taiwan, 2011. @conference{SZ:KZaamas11, Anytime algorithms allow a system to trade solution quality for computation time. In previous work, monitoring techniques have been developed to allow agents to stop the computation at the "right" time so as to optimize a given time-dependent utility function. However, these results apply only to the single-agent case. In this paper we analyze the problems that arise when several agents solve components of a larger problem, each using an anytime algorithm. Monitoring in this case is more challenging as each agent is uncertain about the progress made so far by the others. We develop a formal framework for decentralized monitoring, establish the complexity of several interesting variants of the problem, and propose solution techniques for each one. Finally, we show that the framework can be applied to decentralized flow and planning problems. |
Kumar, Akshat; Zilberstein, Shlomo; Toussaint, Marc Scalable Multiagent Planning Using Probabilistic Inference Conference Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI), Barcelona, Spain, 2011. @conference{SZ:KZTijcai11, Multiagent planning has seen much progress with the development of formal models such as Dec-POMDPs. However, the complexity of these models -- NEXP-Complete even for two agents -- has limited scalability. We identify certain mild conditions that are sufficient to make multiagent planning amenable to a scalable approximation w.r.t. the number of agents. This is achieved by constructing a graphical model in which likelihood maximization is equivalent to plan optimization. Using the Expectation-Maximization framework for likelihood maximization, we show that the necessary inference can be decomposed into processes that often involve a small subset of agents, thereby facilitating scalability. We derive a global update rule that combines these local inferences to monotonically increase the overall solution quality. Experiments on a large multiagent planning benchmark confirm the benefits of the new approach in terms of runtime and scalability. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Online Planning for Ad Hoc Autonomous Agent Teams Conference Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI), Barcelona, Spain, 2011. @conference{SZ:WZCijcai11, We propose a novel online planning algorithm for ad hoc team settings -- challenging situations in which an agent must collaborate with unknown teammates without prior coordination. Our approach is based on constructing and solving a series of stage games, and then using biased adaptive play to choose actions. The utility function in each stage game is estimated via Monte-Carlo tree search using the UCT algorithm. We establish analytically the convergence of the algorithm and show that it performs well in a variety of ad hoc team domains. |
Amato, Christopher; Bernstein, Daniel S; Zilberstein, Shlomo Optimizing Fixed-Size Stochastic Controllers for POMDPs and Decentralized POMDPs Journal Article In: Autonomous Agents and Multi-Agent Systems (JAAMAS), vol. 21, no. 3, pp. 293–320, 2010. @article{SZ:ABZjaamas10, Coordination of distributed agents is required for problems arising in many areas, including multi-robot systems, networking and e-commerce. As a formal framework for such problems, we use the decentralized partially observable Markov decision process (DEC-POMDP). Though much work has been done on optimal dynamic programming algorithms for the single-agent version of the problem, optimal algorithms for the multiagent case have been elusive. The main contribution of this paper is an optimal policy iteration algorithm for solving DEC-POMDPs. The algorithm uses stochastic finite-state controllers to represent policies. The solution can include a correlation device, which allows agents to correlate their actions without communicating. This approach alternates between expanding the controller and performing value-preserving transformations, which modify the controller without sacrificing value. We present two Efficient value-preserving transformations: one can reduce the size of the controller and the other can improve its value while keeping the size fixed. Empirical results demonstrate the usefulness of value-preserving transformations in increasing value while keeping controller size to a minimum. To broaden the applicability of the approach, we also present a heuristic version of the policy iteration algorithm, which sacrifices convergence to optimality. This algorithm further reduces the size of the controllers at each step by assuming that probability distributions over the other agents' actions are known. While this assumption may not hold in general, it helps produce higher quality solutions in our test problems. |
Kumar, Akshat; Zilberstein, Shlomo Point-Based Backup for Decentralized POMDPs: Complexity and New Algorithms Conference Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Toronto, Canada, 2010. @conference{SZ:KZaamas10, Decentralized POMDPs provide an expressive framework for sequential multi-agent decision making. Despite their high complexity, there has been significant progress in scaling up existing algorithms, largely due to the use of point-based methods. Performing point-based backup is a fundamental operation in state-of-the-art algorithms. We show that even a single backup step in the multi-agent setting is NP-Complete. Despite this negative worst-case result, we present an efficient and scalable optimal algorithm as well as a principled approximation scheme. The optimal algorithm exploits recent advances in the weighted CSP literature to overcome the complexity of the backup operation. The polytime approximation scheme provides a constant factor approximation guarantee based on the number of belief points. In experiments on standard domains, the optimal approach provides significant speedup (up to 2 orders of magnitude) over the previous best optimal algorithm and is able to increase the number of belief points by more than a factor of 3. The approximation scheme also works well in practice, providing near-optimal solutions to the backup problem. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Point-Based Policy Generation for Decentralized POMDPs Conference Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Toronto, Canada, 2010. @conference{SZ:WZCaamas10, Memory-bounded techniques have shown great promise in solving complex multi-agent planning problems modeled as DEC-POMDPs. Much of the performance gains can be attributed to pruning techniques that alleviate the complexity of the exhaustive backup step of the original MBDP algorithm. Despite these improvements, state-of-the-art algorithms can still handle a relative small pool of candidate policies, which limits the quality of the solution in some benchmark problems. We present a new algorithm, Point-Based Policy Generation, which avoids altogether searching the entire joint policy space. The key observation is that the best joint policy for each reachable belief state can be constructed directly, instead of producing first a large set of candidates. We also provide an efficient approximate implementation of this operation. The experimental results show that our solution technique improves the performance significantly in terms of both runtime and solution quality. |
Kumar, Akshat; Zilberstein, Shlomo Anytime Planning for Decentralized POMDPs using Expectation Maximization Conference Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI), Catalina Island, California, 2010. @conference{SZ:KZuai10, Decentralized POMDPs provide an expressive framework for multi-agent sequential decision making. While finite-horizon DEC-POMDPs have enjoyed significant success, progress remains slow for the infinite-horizon case mainly due to the inherent complexity of optimizing stochastic controllers representing agent policies. We present a promising new class of algorithms for the infinite-horizon case, which recasts the optimization problem as inference in a mixture of DBNs. An attractive feature of this approach is the straightforward adoption of existing inference techniques in DBNs for solving DEC-POMDPs and supporting richer representations such as factored or continuous states and actions. We also derive the Expectation Maximization (EM) algorithm to optimize the joint policy represented as DBNs. Experiments on benchmark domains show that EM compares favorably against the state-of-the-art solvers. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Rollout Sampling Policy Iteration for Decentralized POMDPs Conference Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI), Catalina Island, California, 2010. @conference{SZ:WZCuai10, We present decentralized rollout sampling policy iteration (DecRSPI)--a new algorithm for multiagent decision problems formalized as DEC-POMDPs. DecRSPI is designed to improve scalability and tackle problems that lack an explicit model. The algorithm uses Monte-Carlo methods to generate a sample of reachable belief states. Then it computes a joint policy for each belief state based on the rollout estimations. A new policy representation allows us to represent solutions compactly. The key benefits of the algorithm are its linear time complexity over the number of agents, its bounded memory usage and good solution quality. It can solve larger problems that are intractable for existing planning algorithms. Experimental results confirm the effectiveness and scalability of the approach. |
Amato, Christopher; Bonet, Blai; Zilberstein, Shlomo Finite-State Controllers Based on Mealy Machines for Centralized and Decentralized POMDPs Conference Proceedings of the 24th Conference on Artificial Intelligence (AAAI), Atlanta, Georgia, 2010. @conference{SZ:ABZaaai10, Existing controller-based approaches for centralized and decentralized POMDPs are based on automata with output known as Moore machines. In this paper, we show that several advantages can be gained by utilizing another type of automata, the Mealy machine. Mealy machines are more powerful than Moore machines, provide a richer structure that can be exploited by solution methods, and can be easily incorporated into current controller-based approaches. To demonstrate this, we adapted some existing controller-based algorithms to use Mealy machines and obtained results on a set of benchmark domains. The Mealy-based approach always outperformed the Moore-based approach and often outperformed the state-of-the-art algorithms for both centralized and decentralized POMDPs. These findings provide fresh and general insights for the improvement of existing algorithms and the development of new ones. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Trial-Based Dynamic Programming for Multi-Agent Planning Conference Proceedings of the 24th Conference on Artificial Intelligence (AAAI), Atlanta, Georgia, 2010. @conference{SZ:WZCaaai10, Trial-based approaches offer an efficient way to solve single-agent MDPs and POMDPs. These approaches allow agents to focus their computations on regions of the environment they encounter during the trials, leading to significant computational savings. We present a novel trial-based dynamic programming (TBDP) algorithm for DEC-POMDPs that extends these benefits to multi-agent settings. The algorithm uses trial-based methods for both belief generation and policy evaluation. Policy improvement is implemented efficiently using linear programming and a sub-policy reuse technique that helps bound the amount of memory. The results show that TBDP can produce significant value improvements and is much faster than the best existing planning algorithms. |
Bernstein, Daniel S; Amato, Christopher; Hansen, Eric A; Zilberstein, Shlomo Policy Iteration for Decentralized Control of Markov Decision Processes Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 34, pp. 89–132, 2009. @article{SZ:BAHZjair09, Coordination of distributed agents is required for problems arising in many areas, including multi-robot systems, networking and e-commerce. As a formal framework for such problems, we use the decentralized partially observable Markov decision process (DEC-POMDP). Though much work has been done on optimal dynamic programming algorithms for the single-agent version of the problem, optimal algorithms for the multiagent case have been elusive. The main contribution of this paper is an optimal policy iteration algorithm for solving DEC-POMDPs. The algorithm uses stochastic finite-state controllers to represent policies. The solution can include a correlation device, which allows agents to correlate their actions without communicating. This approach alternates between expanding the controller and performing value-preserving transformations, which modify the controller without sacrificing value. We present two Efficient value-preserving transformations: one can reduce the size of the controller and the other can improve its value while keeping the size fixed. Empirical results demonstrate the usefulness of value-preserving transformations in increasing value while keeping controller size to a minimum. To broaden the applicability of the approach, we also present a heuristic version of the policy iteration algorithm, which sacrifices convergence to optimality. This algorithm further reduces the size of the controllers at each step by assuming that probability distributions over the other agents' actions are known. While this assumption may not hold in general, it helps produce higher quality solutions in our test problems. |
Petrik, Marek; Zilberstein, Shlomo A Bilinear Programming Approach for Multiagent Planning Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 35, pp. 235–274, 2009. @article{SZ:PZjair09, Multiagent planning and coordination problems are common and known to be computationally hard. We show that a wide range of two-agent problems can be formulated as bilinear programs. We present a successive approximation algorithm that significantly outperforms the coverage set algorithm, which is the state-of-the-art method for this class of multiagent problems. Because the algorithm is formulated for bilinear programs, it is more general and simpler to implement. The new algorithm can be terminated at any time and--unlike the coverage set algorithm--it facilitates the derivation of a useful online performance bound. It is also much more efficient, on average reducing the computation time of the optimal solution by about four orders of magnitude. Finally, we introduce an automatic dimensionality reduction method that improves the effectiveness of the algorithm, extending its applicability to new domains and providing a new way to analyze a subclass of bilinear programs. |
Becker, Raphen; Carlin, Alan; Lesser, Victor; Zilberstein, Shlomo Analyzing Myopic Approaches for Multi-Agent Communication Journal Article In: Computational Intelligence, vol. 25, no. 1, pp. 31–50, 2009. @article{SZ:BCLZci09, Choosing when to communicate is a fundamental problem in multi-agent systems. This problem becomes particularly challenging when communication is constrained and each agent has different partial information about the overall situation. We take a decision-theoretic approach to this problem that balances the benefits of communication against the costs. Although computing the exact value of communication is intractable, it can be estimated using a standard myopic assumption--that communication is only possible at the present time. We examine specific situations in which this assumption leads to poor performance and demonstrate an alternative approach that relaxes the assumption and improves performance. The results provide an effective method for value-driven communication policies in multi-agent systems. |
Amato, Christopher; Zilberstein, Shlomo Achieving Goals in Decentralized POMDPs Conference Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Budapest, Hungary, 2009. @conference{ASZ:Zaamas09, Coordination of multiple agents under uncertainty in the decentralized POMDP model is known to be NEXP-complete, even when the agents have a joint set of goals. Nevertheless, we show that the existence of goals can help develop effective planning algorithms. We examine an approach to model these problems as indefinite-horizon decentralized POMDPs, suitable for many practical problems that terminate after some unspecified number of steps. Our algorithm for solving these problems is optimal under some common assumptions--that terminal actions exist for each agent and rewards for non-terminal actions are negative. We also propose an infinite-horizon approximation method that allows us to relax these assumptions while maintaining goal conditions. An optimality bound is developed for this sample-based approach and experimental results show that it is able to exploit the goal structure effectively. Compared with the state-of-the-art, our approach can solve larger problems and produce significantly better solutions. |
Kumar, Akshat; Zilberstein, Shlomo Constraint-Based Dynamic Programming for Decentralized POMDPs with Structured Interactions Conference Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Budapest, Hungary, 2009. @conference{SZ:KZaamas09, Decentralized partially observable MDPs (DEC-POMDPs) provide a rich framework for modeling decision making by a team of agents. Despite rapid progress in this area, the limited scalability of solution techniques has restricted the applicability of the model. To overcome this computational barrier, research has focused on restricted classes of DEC-POMDPs, which are easier to solve yet rich enough to capture many practical problems. We present CBDP, an efficient and scalable point-based dynamic programming algorithm for one such model called ND-POMDP (Network Distributed POMDP). Specifically, CBDP provides magnitudes of speedup in the policy computation and generates better quality solution for all test instances. It has linear complexity in the number of agents and horizon length. Furthermore, the complexity per horizon for the examined class of problems is exponential only in a small parameter that depends upon the interaction among the agents, achieving significant scalability for large, loosely coupled multi-agent systems. The efficiency of CBDP lies in exploiting the structure of interactions using constraint networks. These results extend significantly the effectiveness of decision-theoretic planning in multi-agent settings. |
Kumar, Akshat; Zilberstein, Shlomo Dynamic Programming Approximations for Partially Observable Stochastic Games Conference Proceedings of the 22nd International FLAIRS Conference, Sanibel Island, Florida, 2009. @conference{SZ:KZflairs09, Partially observable stochastic games (POSGs) provide a rich mathematical framework for planning under uncertainty by a group of agents. However, this modeling advantage comes with a price, namely a high computational cost. Solving POSGs optimally quickly becomes intractable after a few decision cycles. Our main contribution is to provide bounded approximation techniques, which enable us to scale POSG algorithms by several orders of magnitude. We study both the POSG model and its cooperative counterpart, DEC-POMDP. Experiments on a number of problems confirm the scalability of our approach while still providing useful policies. |
Kumar, Akshat; Zilberstein, Shlomo Event-Detecting Multi-Agent MDPs: Complexity and Constant-Factor Approximation Conference Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI), Pasadena, California, 2009. @conference{SZ:KZijcai09, Planning under uncertainty for multiple agents has grown rapidly with the development of formal models such as multi-agent MDPs and decentralized MDPs. But despite their richness, the applicability of these models remains limited due to their computational complexity. We present the class of event-detecting multi-agent MDPs (eMMDPs), designed to detect multiple mobile targets by a team of sensor agents. We show that eMMDPs are NP-Hard and present a scalable 2-approximation algorithm for solving them using matroid theory and constraint optimization. The complexity of the algorithm is linear in the state-space and number of agents, quadratic in the horizon, and exponential only in a small parameter that depends on the interaction among the agents. Despite the worst-case approximation ratio of 2, experimental results show that the algorithm produces near-optimal policies for a range of test problems. |
Amato, Christopher; Dibangoye, Jilles Steeve; Zilberstein, Shlomo Incremental Policy Generation for Finite-Horizon DEC-POMDPs Conference Proceedings of the 19th International Conference on Automated Planning and Scheduling (ICAPS), Thessaloniki, Greece, 2009. @conference{SZ:ADZicaps09, Decentralized partially observable MDPs (DEC-POMDPs) provide a rich framework for modeling decision making by a team of agents. Despite rapid progress in this area, the limited scalability of solution techniques has restricted the applicability of the model. To overcome this computational barrier, research has focused on restricted classes of DEC-POMDPs, which are easier to solve yet rich enough to capture many practical problems. We present CBDP, an efficient and scalable point-based dynamic programming algorithm for one such model called ND-POMDP (Network Distributed POMDP). Specifically, CBDP provides magnitudes of speedup in the policy computation and generates better quality solution for all test instances. It has linear complexity in the number of agents and horizon length. Furthermore, the complexity per horizon for the examined class of problems is exponential only in a small parameter that depends upon the interaction among the agents, achieving significant scalability for large, loosely coupled multi-agent systems. The efficiency of CBDP lies in exploiting the structure of interactions using constraint networks. These results extend significantly the effectiveness of decision-theoretic planning in multi-agent settings. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Multi-Agent Online Planning with Communication Conference Proceedings of the 19th International Conference on Automated Planning and Scheduling (ICAPS), Thessaloniki, Greece, 2009. @conference{SZ:WZCicaps09, We propose an online algorithm for planning under uncertainty in multi-agent settings modeled as DEC-POMDPs. The algorithm helps overcome the high computational complexity of solving such problems off-line. The key challenge is to produce coordinated behavior using little or no communication. When communication is allowed but constrained, the challenge is to produce high value with minimal communication. The algorithm addresses these challenges by communicating only when history inconsistency is detected, allowing communication to be postponed if necessary. Moreover, it bounds the memory usage at each step and can be applied to problems with arbitrary horizons. The experimental results confirm that the algorithm can solve problems that are too large for the best existing off-line planning algorithms and it outperforms the best online method, producing higher value with much less communication in most cases. |
Allen, Martin; Zilberstein, Shlomo Complexity of Decentralized Control: Special Cases Conference Proceedings of the 23rd Neural Information Processing Systems Conference (NIPS), Vancouver, British Columbia, Canada, 2009. @conference{SZ:AZnips09, The worst-case complexity of general decentralized POMDPs, which are equivalent to partially observable stochastic games (POSGs) is very high, both for the cooperative and competitive cases. Some reductions in complexity have been achieved by exploiting independence relations in some models. We show that these results are somewhat limited: when these independence assumptions are relaxed in very small ways, complexity returns to that of the general case. |
Goldman, Claudia V; Zilberstein, Shlomo Communication-Based Decomposition Mechanisms for Decentralized MDPs Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 32, pp. 169–202, 2008. @article{SZ:GZjair08, Multi-agent planning in stochastic environments can be framed formally as a decentralized Markov decision problem. Many real-life distributed problems that arise in manufacturing, multi-robot coordination and information gathering scenarios can be formalized using this framework. However, finding the optimal solution in the general case is hard, limiting the applicability of recently developed algorithms. This paper provides a practical approach for solving decentralized control problems when communication among the decision makers is possible, but costly. We develop the notion of communication-based mechanism that allows us to decompose a decentralized MDP into multiple single-agent problems. In this framework, referred to as decentralized semi-Markov decision process with direct communication (Dec-SMDP-Com), agents operate separately between communications. We show that finding an optimal mechanism is equivalent to solving optimally a Dec-SMDP-Com. We also provide a heuristic search algorithm that converges on the optimal decomposition. Restricting the decomposition to some specific types of local behaviors reduces significantly the complexity of planning. In particular, we present a polynomial-time algorithm for the case in which individual agents perform goal-oriented behaviors between communications. The paper concludes with an additional tractable algorithm that enables the introduction of human knowledge, thereby reducing the overall problem to finding the best time to communicate. Empirical results show that these approaches provide good approximate solutions. |
Seuken, Sven; Zilberstein, Shlomo Formal Models and Algorithms for Decentralized Decision Making under Uncertainty Journal Article In: Autonomous Agents and Multi-Agent Systems (JAAMAS), vol. 17, no. 2, pp. 190–250, 2008. @article{SZ:SZjaamas08, Multi-agent planning in stochastic environments can be framed formally as a decentralized Markov decision problem. Many real-life distributed problems that arise in manufacturing, multi-robot coordination and information gathering scenarios can be formalized using this framework. However, finding the optimal solution in the general case is hard, limiting the applicability of recently developed algorithms. This paper provides a practical approach for solving decentralized control problems when communication among the decision makers is possible, but costly. We develop the notion of communication-based mechanism that allows us to decompose a decentralized MDP into multiple single-agent problems. In this framework, referred to as decentralized semi-Markov decision process with direct communication (Dec-SMDP-Com), agents operate separately between communications. We show that finding an optimal mechanism is equivalent to solving optimally a Dec-SMDP-Com. We also provide a heuristic search algorithm that converges on the optimal decomposition. Restricting the decomposition to some specific types of local behaviors reduces significantly the complexity of planning. In particular, we present a polynomial-time algorithm for the case in which individual agents perform goal-oriented behaviors between communications. The paper concludes with an additional tractable algorithm that enables the introduction of human knowledge, thereby reducing the overall problem to finding the best time to communicate. Empirical results show that these approaches provide good approximate solutions. |
Petrik, Marek; Zilberstein, Shlomo A Successive Approximation Algorithm for Coordination Problems Conference Proceedings of the 10th International Symposium on Artificial Intelligence and Mathematics (ISAIM), Ft. Lauderdale, Florida, 2008. @conference{SZ:PZisaim08, Developing scalable coordination algorithms for multi-agent systems is a hard computational challenge. One useful approach, demonstrated by the Coverage Set Algorithm (CSA), exploits structured interaction to produce significant computational gains. Empirically, CSA exhibits very good anytime performance, but an error bound on the results has not been established. We reformulate the algorithm and derive an online error bound for approximate solutions. Moreover, we propose an effective way to automatically reduce the complexity of the interaction. Our experiments show that this is a promising approach to solve a broad class of decentralized decision problems. The general formulation used by the algorithm makes it both easy to implement and widely applicable to a variety of other AI problems. |
Carlin, Alan; Zilberstein, Shlomo Value-Based Observation Compression for DEC-POMDPs Conference Proceedings of the 7th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Estoril, Portugal, 2008. @conference{SZ:CZaamas08, Representing agent policies compactly is essential for improving the scalability of multi-agent planning algorithms. In this paper, we focus on developing a pruning technique that allows us to merge certain observations within agent policies, while minimizing loss of value. This is particularly important for solving finite-horizon decentralized POMDPs, where agent policies are represented as trees, and where the size of policy trees grows exponentially with the number of observations. We introduce a value-based observation compression technique that prunes the least valuable observations while maintaining an error bound on the value lost as a result of pruning. We analyze the characteristics of this pruning strategy and show empirically that it is effective. Thus, we use compact policies to obtain significantly higher values compared with the best existing DEC-POMDP algorithm. |
Carlin, Alan; Zilberstein, Shlomo Observation Compression in DEC-POMDP Policy Trees Conference AAMAS Workshop on Multi-Agent Sequential Decision Making in Uncertain Domains (MSDM), Estoril, Portugal, 2008, (Best Paper Award). @conference{SZ:CZmsdm08, Representing agent policies compactly is essential for improving the scalability of multi-agent planning algorithms. In this paper, we focus on developing a pruning technique that allows us to merge certain observations from agent policies, while minimizing the loss of value. This is particularly important for solving finite-horizon decentralized POMDPs, where agent policies are represented as trees, and where the size of policy trees grows exponentially with the number of observations. We introduce a value-based observation compression technique that prunes the least valuable observations while maintaining an error bound on the value lost as a result of pruning. We analyze the characteristics of this pruning strategy and show empirically that it is effective. Thus, we use compact policies to obtain significantly higher values compared with the best existing DEC-POMDP algorithm. |
Amato, Christopher; Zilberstein, Shlomo What's Worth Memorizing: Attribute-based Planning for DEC-POMDPs Conference ICAPS Workshop on Multiagent Planning, Sydney, Australia, 2008. @conference{SZ:AZmasplan08, Current algorithms for decentralized partially observable Markov decision processes (DEC-POMDPs) require a large amount of memory to produce high quality plans. To combat this, existing methods optimize a set of finite-state controllers with an arbitrary amount of fixed memory. While this works well for some problems, in general, scalability and solution quality remain limited. As an alternative, we propose remembering some attributes that summarize key aspects of an agent's action and observation history. These attributes are often simple to determine, provide a well-motivated choice of controller size and focus the solution search on important components of agent histories. We show that for a range of DEC-POMDPs such attribute-based representation improves plan quality and scalability. |
Goldman, Claudia V; Allen, Martin; Zilberstein, Shlomo Learning to Communicate in a Decentralized Environment Journal Article In: Autonomous Agents and Multi-Agent Systems (JAAMAS), vol. 15, no. 1, pp. 47–90, 2007. @article{SZ:GAZjaamas07, Learning to communicate is an emerging challenge in AI research. It is known that agents interacting in decentralized, stochastic environments can benefit from exchanging information. Multi-agent planning generally assumes that agents share a common means of communication; however, in building robust distributed systems it is important to address potential miscoordination resulting from misinterpretation of messages exchanged. This paper lays foundations for studying this problem, examining its properties analytically and empirically in a decision-theoretic context. We establish a formal framework for the problem, and identify a collection of necessary and sufficient properties for decision problems that allow agents to employ probabilistic updating schemes in order to learn how to interpret what others are communicating. Solving the problem optimally is often intractable, but our approach enables agents using different languages to converge upon coordination over time. Our experimental work establishes how these methods perform when applied to problems of varying complexity. |
Seuken, Sven; Zilberstein, Shlomo Memory-Bounded Dynamic Programming for DEC-POMDPs Conference Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI), Hyderabad, India, 2007. @conference{SZ:SZijcai07, Decentralized decision making under uncertainty has been shown to be intractable when each agent has different partial information about the domain. Thus, improving the applicability and scalability of planning algorithms is an important challenge. We present the first memory-bounded dynamic programming algorithm for finite-horizon decentralized POMDPs. A set of heuristics is used to identify relevant points of the infinitely large belief space. Using these belief points, the algorithm successively selects the best joint policies for each horizon. The algorithm is extremely efficient, having linear time and space complexity with respect to the horizon length. Experimental results show that it can handle horizons that are multiple orders of magnitude larger than what was previously possible, while achieving the same or better solution quality. These results significantly increase the applicability of decentralized decision-making techniques. |
Amato, Christopher; Bernstein, Daniel S; Zilberstein, Shlomo Optimizing Memory-Bounded Controllers for Decentralized POMDPs Conference Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (UAI), Vancouver, British Columbia, 2007. @conference{SZ:ABZuai07, We present a memory-bounded optimization approach for solving infinite-horizon decentralized POMDPs. Policies for each agent are represented by stochastic finite state controllers. We formulate the problem of optimizing these policies as a nonlinear program, leveraging powerful existing nonlinear optimization techniques for solving the problem. While existing solvers only guarantee locally optimal solutions, we show that our formulation produces higher quality controllers than the state-of-the-art approach. We also incorporate a shared source of randomness in the form of a correlation device to further increase solution quality with only a limited increase in space and time. Our experimental results show that nonlinear optimization can be used to provide high quality, concise solutions to decentralized decision problems under uncertainty. |
Seuken, Sven; Zilberstein, Shlomo Improved Memory-Bounded Dynamic Programming for Decentralized POMDPs Conference Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (UAI), Vancouver, British Columbia, 2007. @conference{SZ:SZuai07, Memory-Bounded Dynamic Programming (MBDP) has proved extremely effective in solving decentralized POMDPs with large horizons. We generalize the algorithm and improve its scalability by reducing the complexity with respect to the number of observations from exponential to polynomial. We derive error bounds on solution quality with respect to this new approximation and analyze the convergence behavior. To evaluate the effectiveness of the improvements, we introduce a new, larger benchmark problem. Experimental results show that despite the high complexity of decentralized POMDPs, scalable solution techniques such as MBDP perform surprisingly well. |
Allen, Martin; Zilberstein, Shlomo Agent Influence as a Predictor of Difficulty for Decentralized Problem-Solving Conference Proceedings of the 22nd Conference on Artificial Intelligence (AAAI), Vancouver, British Columbia, 2007. @conference{SZ:AZaaai07, We study the effect of problem structure on the practical performance of optimal dynamic programming for decentralized decision problems. It is shown that restricting agent influence over problem dynamics can make the problem easier to solve. Experimental results establish that agent influence correlates with problem difficulty: as the gap between the influence of different agents grows, problems tend to become much easier to solve. The measure thus provides a general-purpose, automatic characterization of decentralized problems, identifying those for which optimal methods are more or less likely to work. Such a measure is also of possible use as a heuristic in the design of algorithms that create task decompositions and control hierarchies in order to simplify multiagent problems. |
Petrik, Marek; Zilberstein, Shlomo Anytime Coordination Using Separable Bilinear Programs Conference Proceedings of the 22nd Conference on Artificial Intelligence (AAAI), Vancouver, British Columbia, 2007. @conference{SZ:PZaaai07, Developing scalable coordination algorithms for multi-agent systems is a hard computational challenge. One useful approach, demonstrated by the Coverage Set Algorithm (CSA), exploits structured interaction to produce significant computational gains. Empirically, CSA exhibits very good anytime performance, but an error bound on the results has not been established. We reformulate the algorithm and derive both online and offline error bounds for approximate solutions. Moreover, we propose an effective way to automatically reduce the complexity of the interaction. Our experiments show that this is a promising approach to solve a broad class of decentralized decision problems. The general formulation used by the algorithm makes it both easy to implement and widely applicable to a variety of other AI problems. |
Szer, Daniel; Charpillet, Francois; Zilberstein, Shlomo MAA*: A Heuristic Search Algorithm for Solving Decentralized POMDPs Conference Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence (UAI), Edinburgh, Scotland, 2005. @conference{SZ:SCZuai05, We present multi-agent A* (MAA*), the first complete and optimal heuristic search algorithm for solving decentralized partially-observable Markov decision problems (DEC- POMDPs) with finite horizon. The algorithm is suitable for computing optimal plans for a cooperative group of agents that operate in a stochastic environment such as multi-robot coordination, network traffic control, or distributed resource allocation. Solving such problems effectively is a major challenge in the area of planning under uncertainty. Our solution is based on a synthesis of classical heuristic search and decentralized control theory. Experimental results show that MAA* has significant advantages. We introduce an anytime variant of MAA* and conclude with a discussion of promising extensions such as an approach to solving infinite-horizon problems. |
Allen, Martin; Goldman, Claudia V; Zilberstein, Shlomo Language Learning in Multi-Agent Systems Conference Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI), Edinburgh, Scotland, 2005. @conference{SZ:AGZijcai05, We present the problem of learning to communicate in decentralized and stochastic environments, analyzing it formally in a decision-theoretic context and illustrating the concept experimentally. Our approach allows agents to converge upon coordinated communication and action over time. |
Bernstein, Daniel S; Hansen, Eric A; Zilberstein, Shlomo Bounded Policy Iteration for Decentralized POMDPs Conference Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI), Edinburgh, Scotland, 2005. @conference{SZ:BHZijcai05, We present a bounded policy iteration algorithm for infinite-horizon decentralized POMDPs. Policies are represented as joint stochastic finite-state controllers, which consist of a local controller for each agent. We also let a joint controller include a correlation device that allows the agents to correlate their behavior without exchanging information during execution, and show that this leads to improved performance. The algorithm uses a fixed amount of memory, and each iteration is guaranteed to produce a controller with value at least as high as the previous one for all possible initial state distributions. For the case of a single agent, the algorithm reduces to Poupart and Boutilier's bounded policy iteration for POMDPs. |
Building Safe AI systems
Mahmud, Saaduddin; Nashed, Samer B.; Goldman, Claudia V.; Zilberstein, Shlomo Estimating Causal Responsibility for Explaining Autonomous Behavior Book Section In: Calvaresi, Davide (Ed.): International Workshop on Explainable and Transparent AI and Multi-Agent Systems (EXTRAAMAS), pp. 78–94, Springer, 2023. @incollection{SZ:MNGZextraamas23, There has been growing interest in causal explanations of stochastic, sequential decision-making systems. Structural causal models and causal reasoning offer several theoretical benefits when exact inference can be applied. Furthermore, users overwhelmingly prefer the resulting causal explanations over other state-of-the-art systems. In this work, we focus on one such method, MeanRESP, and its approximate versions that drastically reduce compute load and assign a responsibility score to each variable, which helps identify smaller sets of causes to be used as explanations. However, this method, and its approximate versions in particular, lack deeper theoretical analysis and broader empirical tests. To address these shortcomings, we provide three primary contributions. First, we offer several theoretical insights on the sample complexity and error rate of approximate MeanRESP. Second, we discuss several automated metrics for comparing explanations generated from approximate methods to those generated via exact methods. While we recognize the significance of user studies as the gold standard for evaluating explanations, our aim is to leverage the proposed metrics to systematically compare explanation-generation methods along important quantitative dimensions. Finally, we provide a more detailed discussion of MeanRESP and how its output under different definitions of responsibility compares to existing widely adopted methods that use Shapley values. |
Parr, Shane; Khatri, Ishan; Svegliato, Justin; Zilberstein, Shlomo Agent-Aware State Estimation in Autonomous Vehicles Conference Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 2021. @conference{SZ:PKSZiros21, Autonomous systems often operate in environments where the behavior of multiple agents is coordinated by a shared global state. Reliable estimation of the global state is thus critical for successfully operating in a multi-agent setting. We introduce agent-aware state estimation--a framework for calculating indirect estimations of state given observations of the behavior of other agents in the environment. We also introduce transition-independent agent-aware state estimation--a tractable class of agent-aware state estimation--and show that it allows the speed of inference to scale linearly with the number of agents in the environment. As an example, we model traffic light classification in instances of complete loss of direct observation. By taking into account observations of vehicular behavior from multiple directions of traffic, our approach exhibits accuracy higher than that of existing traffic light-only HMM methods on a real-world autonomous vehicle data set under a variety of simulated occlusion scenarios. |
Wu, Feng; Zilberstein, Shlomo; Jennings, Nicholas R Multi-Agent Planning with High-Level Human Guidance Conference Proceedings of Principles and Practice of Multi-Agent Systems (PRIMA), 2020. @conference{SZ:WZJprima20, Planning and coordination of multiple agents in the presence of uncertainty and noisy sensors is extremely hard. A human operator who observes a multi-agent team can provide valuable guidance to the team based on her superior ability to interpret observations and assess the overall situation. We propose an extension of decentralized POMDPs that allows such human guidance to be factored into the planning and execution processes. Human guidance in our framework consists of intuitive high-level commands that the agents must translate into a suitable joint plan that is sensitive to what they know from local observations. The result is a framework that allows multi-agent systems to benefit from the complex strategic thinking of a human supervising them. We evaluate this approach on several common benchmark problems and show that it can lead to dramatic improvement in performance. |
Wu, Feng; Zilberstein, Shlomo; Jennings, Nicholas R Stochastic Multi-agent Planning with Partial State Models Conference Proceedings of the First International Conference on Distributed Artificial Intelligence (DAI), Beijing, China, 2019. @conference{SZ:WZJdai19, People who observe a multi-agent team can often provide valuable information to the agents based on their superior cognitive abilities to interpret sequences of observations and assess the overall situation. The knowledge they possess is often difficult to be fully represented using a formal model such as DEC-POMDP. To deal with this, we propose an extension of the DEC-POMDP that allows states to be partially specified and benefit from expert knowledge, while preserving the partial observability and decentralized operation of the agents. In particular, we present an algorithm for computing policies based on history samples that include human labeled data in the form of reward reshaping. We also consider ways to minimize the burden on human experts during the labeling phase. The results offer the first approach to incorporating human knowledge in such complex multi-agent settings. We demonstrate the benefits of our approach using a disaster recovery scenario, comparing it to several baseline approaches. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Privacy-Preserving Policy Iteration for Decentralized POMDPs Conference Proceedings of the 32nd Conference on Artificial Intelligence (AAAI), New Orleans, Louisiana, 2018. @conference{SZ:WZCaaai18, We propose the first privacy-preserving approach to address the privacy issues that arise in multi-agent planning problems modeled as a Dec-POMDP. Our solution is a distributed message-passing algorithm based on trials, where the agents' policies are optimized using the cross-entropy method. In our algorithm, the agents' private information is protected using a public-key homomorphic cryptosystem. We prove the correctness of our algorithm and analyze its complexity in terms of message passing and encryption/decryption operations. Furthermore, we analyze several privacy aspects of our algorithm and show that it can preserve the agent privacy of non-neighbors, model privacy, and decision privacy. Our experimental results on several common Dec-POMDP bench- mark problems confirm the effectiveness of our approach. |
Wray, Kyle Hollins; Kumar, Akshat; Zilberstein, Shlomo Integrated Cooperation and Competition in Multi-Agent Decision-Making Conference Proceedings of the 32nd Conference on Artificial Intelligence (AAAI), New Orleans, Louisiana, 2018. @conference{SZ:WKZaaai18, Observing that many real-world sequential decision problems are not purely cooperative or purely competitive, we propose a new model--cooperative-competitive process (CCP)--that can simultaneously encapsulate both cooperation and competition. First, we discuss how the CCP model bridges the gap between cooperative and competitive models. Next, we investigate a specific class of group-dominant CCPs, in which agents cooperate to achieve a common goal as their primary objective, while also pursuing individual goals as a secondary objective. We provide an approximate solution for this class of problems that leverages stochastic finite-state controllers. The model is grounded in two multi-robot meeting and box-pushing domains that are implemented in simulation and demonstrated on two real robots. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Multi-Agent Planning with Baseline Regret Minimization Conference Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI), 2017. @conference{SZ:WZCijcai17, We propose a novel baseline regret minimization algorithm for multi-agent planning problems modeled as finite-horizon decentralized POMDPs. It guarantees to produce a policy that is provably at least as good as a given baseline policy. We also propose an iterative belief generation algorithm to efficiently minimize the baseline regret, which only requires necessary iterations so as to converge to the policy with minimum baseline regret. Experimental results on common benchmark problems confirm the benefits of the algorithm compared with the state-of-the-art approaches. |
Kumar, Akshat; Mostafa, Hala; Zilberstein, Shlomo Dual Formulations for Optimizing Dec-POMDP Controllers Conference Proceedings of the 26th International Conference on Automated Planning and Scheduling (ICAPS), London, UK, 2016. @conference{SZ:KMZicaps16, Decentralized POMDP is an expressive model for multiagent planning. Finite-state controllers (FSCs)--often used to represent policies for infinite-horizon problems---offer a compact, simple-to-execute policy representation. We exploit novel connections between optimizing decentralized FSCs and the dual linear program for MDPs. Consequently, we describe a dual mixed integer linear program (MIP) for optimizing deterministic FSCs. We exploit the Dec-POMDP structure to devise a compact MIP and formulate constraints that result in policies executable in partially-observable decentralized settings. We show analytically that the dual formulation can also be exploited within the expectation maximization (EM) framework to optimize stochastic FSCs. The resulting EM algorithm can be implemented by solving a sequence of linear programs, without requiring expensive message passing over the Dec-POMDP DBN. We also present an efficient technique for policy improvement based on a weighted entropy measure. Compared with state-of-the-art FSC methods, our approach offers over an order-of-magnitude speedup, while producing similar or better solutions. |
Kumar, Akshat; Zilberstein, Shlomo; Toussaint, Marc Probabilistic Inference Techniques for Scalable Multiagent Decision Making Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 53, pp. 223–270, 2015. @article{SZ:KZTjair15, Decentralized POMDPs provide an expressive framework for multiagent sequential decision making. However, the complexity of these models -- NEXP-Complete even for two agents -- has limited their scalability. We present a promising new class of approximation algorithms by developing novel connections between multiagent planning and machine learning. We show how the multiagent planning problem can be reformulated as inference in a mixture of dynamic Bayesian networks (DBNs). This planning-as-inference approach paves the way for the application of efficient inference techniques in DBNs to multiagent decision making. To further improve scalability, we identify certain conditions that are sufficient to extend the approach to multiagent systems with dozens of agents. Specifically, we show that the necessary inference within the expectation-maximization framework can be decomposed into processes that often involve a small subset of agents, thereby facilitating scalability. We further show that a number of existing multiagent planning models satisfy these conditions. Experiments on large planning benchmarks confirm the benefits of our approach in terms of runtime and scalability with respect to existing techniques. |
Nguyen, Duc Thien; Yeoh, William; Lau, Hoong Chuin; Zilberstein, Shlomo; Zhang, Chongjie Decentralized Multi-Agent Reinforcement Learning in Average-Reward Dynamic DCOPs Conference Proceedings of the 28th Conference on Artificial Intelligence (AAAI), Quebec City, Canada, 2014. @conference{SZ:NYLZZaaai14, Researchers have introduced the Dynamic Distributed Constraint Optimization Problem (Dynamic DCOP) formulation to model dynamically changing multi-agent coordination problems, where a dynamic DCOP is a sequence of (static canonical) DCOPs, each partially different from the DCOP preceding it. Existing work typically assumes that the problem in each time step is decoupled from the problems in other time steps, which might not hold in some applications. Therefore, in this paper, we make the following contributions: (i) We introduce a new model, called Markovian Dynamic DCOPs (MD-DCOPs), where the DCOP in the next time step is a function of the value assignments in the current time step; (ii) We introduce two distributed reinforcement learning algorithms, the Distributed RVI Q-learning algorithm and the Distributed R-learning algorithm, that balance exploration and exploitation to solve MD-DCOPs in an online manner; and (iii) We empirically evaluate them against an existing multiarm bandit DCOP algorithm on dynamic DCOPs. |
Brafman, Ronen I; Shani, Guy; Zilberstein, Shlomo Qualitative Planning under Partial Observability in Multi-Agent Domains Conference Proceedings of the 27th Conference on Artificial Intelligence (AAAI), Bellevue, Washington, 2013. @conference{SZ:BSZaaai13, Decentralized POMDPs (Dec-POMDPs) provide a rich, attractive model for planning under uncertainty and partial observability in cooperative multi-agent domains with a growing body of research. In this paper we formulate a qualitative, propositional model for multi-agent planning under uncertainty with partial observability, which we call Qualitative Dec-POMDP (QDec-POMDP). We show that the worst-case complexity of planning in QDec-POMDPs is similar to that of Dec-POMDPs. Still, because the model is more "classical" in nature, it is more compact and easier to specify. Furthermore, it eases the adaptation of methods used in classical and contingent planning to solve problems that challenge current Dec-POMDPs solvers. In particular, in this paper we describe a method based on compilation to classical planning, which handles multi-agent planning problems significantly larger than those handled by current Dec-POMDP algorithms. |
Wu, Feng; Zilberstein, Shlomo; Jennings, Nicholas R Monte-Carlo Expectation Maximization for Decentralized POMDPs Conference Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI), Beijing, China, 2013. @conference{SZ:WZJijcai13, We address two significant drawbacks of state-of-the-art solvers of decentralized POMDPs (DEC-POMDPs): the reliance on complete knowledge of the model and limited scalability as the complexity of the domain grows. We extend a recently proposed approach for solving DEC-POMDPs via a reduction to the maximum likelihood problem, which in turn can be solved using EM. We introduce a model-free version of this approach that employs Monte-Carlo EM (MCEM). While a naive implementation of MCEM is inadequate in multi-agent settings, we introduce several improvements in sampling that produce high-quality results on a variety of DEC-POMDP benchmarks, including large problems with thousands of agents. |
Yeoh, William; Kumar, Akshat; Zilberstein, Shlomo Automated Generation of Interaction Graphs for Value-Factored Dec-POMDPs Conference Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI), Beijing, China, 2013. @conference{SZ:YKZijcai13, The Decentralized Partially Observable Markov Decision Process (Dec-POMDP) is a powerful model for multiagent planning under uncertainty, but its applicability is hindered by its high complexity -- solving Dec-POMDPs optimally is NEXP-hard. Recently, Kumar et al. introduced the Value Factorization (VF) framework, which exploits decomposable value functions that can be factored into subfunctions. This framework has been shown to be a generalization of several models that leverage sparse agent interactions such as TI-Dec-MDPs, ND-POMDPs and TD-POMDPs. Existing algorithms for these models assume that the interaction graph of the problem is given. In this paper, we introduce three algorithms to automatically generate interaction graphs for models within the VF framework and establish lower and upper bounds on the expected reward of an optimal joint policy. We illustrate experimentally the benefits of these techniques for sensor placement in a decentralized tracking application. |
Durfee, Edmund; Zilberstein, Shlomo Multiagent Planning, Control, and Execution Book Section In: Weiss, G (Ed.): Multiagent Systems, Second Edition, pp. 485–546, MIT Press, Cambridge, MA, USA, 2013. @incollection{SZ:DZmultiagent13, |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Online Planning for Multi-Agent Systems with Bounded Communication Journal Article In: Artificial Intelligence (AIJ), vol. 175, no. 2, pp. 487–511, 2011. @article{SZ:WZCaij11, We propose an online algorithm for planning under uncertainty in multi-agent settings modeled as DEC-POMDPs. The algorithm helps overcome the high computational complexity of solving such problems offline. The key challenges in decentralized operation are to maintain coordinated behavior with little or no communication and, when communication is allowed, to optimize value with minimal communication. The algorithm addresses these challenges by generating identical conditional plans based on common knowledge and communicating only when history inconsistency is detected, allowing communication to be postponed when necessary. To be suitable for online operation, the algorithm computes good local policies using a new and fast local search method implemented using linear programming. Moreover, it bounds the amount of memory used at each step and can be applied to problems with arbitrary horizons. The experimental results confirm that the algorithm can solve problems that are too large for the best existing offline planning algorithms and it outperforms the best online method, producing much higher value with much less communication in most cases. The algorithm also proves to be effective when the communication channel is imperfect (periodically unavailable). These results contribute to the scalability of decision-theoretic planning in multi-agent settings. |
Kumar, Akshat; Zilberstein, Shlomo Message-Passing Algorithms for Large Structured Decentralized POMDPs Conference Proceedings of the 10th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Taipei, Taiwan, 2011. @conference{SZ:KZaamas11, Anytime algorithms allow a system to trade solution quality for computation time. In previous work, monitoring techniques have been developed to allow agents to stop the computation at the "right" time so as to optimize a given time-dependent utility function. However, these results apply only to the single-agent case. In this paper we analyze the problems that arise when several agents solve components of a larger problem, each using an anytime algorithm. Monitoring in this case is more challenging as each agent is uncertain about the progress made so far by the others. We develop a formal framework for decentralized monitoring, establish the complexity of several interesting variants of the problem, and propose solution techniques for each one. Finally, we show that the framework can be applied to decentralized flow and planning problems. |
Kumar, Akshat; Zilberstein, Shlomo; Toussaint, Marc Scalable Multiagent Planning Using Probabilistic Inference Conference Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI), Barcelona, Spain, 2011. @conference{SZ:KZTijcai11, Multiagent planning has seen much progress with the development of formal models such as Dec-POMDPs. However, the complexity of these models -- NEXP-Complete even for two agents -- has limited scalability. We identify certain mild conditions that are sufficient to make multiagent planning amenable to a scalable approximation w.r.t. the number of agents. This is achieved by constructing a graphical model in which likelihood maximization is equivalent to plan optimization. Using the Expectation-Maximization framework for likelihood maximization, we show that the necessary inference can be decomposed into processes that often involve a small subset of agents, thereby facilitating scalability. We derive a global update rule that combines these local inferences to monotonically increase the overall solution quality. Experiments on a large multiagent planning benchmark confirm the benefits of the new approach in terms of runtime and scalability. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Online Planning for Ad Hoc Autonomous Agent Teams Conference Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI), Barcelona, Spain, 2011. @conference{SZ:WZCijcai11, We propose a novel online planning algorithm for ad hoc team settings -- challenging situations in which an agent must collaborate with unknown teammates without prior coordination. Our approach is based on constructing and solving a series of stage games, and then using biased adaptive play to choose actions. The utility function in each stage game is estimated via Monte-Carlo tree search using the UCT algorithm. We establish analytically the convergence of the algorithm and show that it performs well in a variety of ad hoc team domains. |
Amato, Christopher; Bernstein, Daniel S; Zilberstein, Shlomo Optimizing Fixed-Size Stochastic Controllers for POMDPs and Decentralized POMDPs Journal Article In: Autonomous Agents and Multi-Agent Systems (JAAMAS), vol. 21, no. 3, pp. 293–320, 2010. @article{SZ:ABZjaamas10, Coordination of distributed agents is required for problems arising in many areas, including multi-robot systems, networking and e-commerce. As a formal framework for such problems, we use the decentralized partially observable Markov decision process (DEC-POMDP). Though much work has been done on optimal dynamic programming algorithms for the single-agent version of the problem, optimal algorithms for the multiagent case have been elusive. The main contribution of this paper is an optimal policy iteration algorithm for solving DEC-POMDPs. The algorithm uses stochastic finite-state controllers to represent policies. The solution can include a correlation device, which allows agents to correlate their actions without communicating. This approach alternates between expanding the controller and performing value-preserving transformations, which modify the controller without sacrificing value. We present two Efficient value-preserving transformations: one can reduce the size of the controller and the other can improve its value while keeping the size fixed. Empirical results demonstrate the usefulness of value-preserving transformations in increasing value while keeping controller size to a minimum. To broaden the applicability of the approach, we also present a heuristic version of the policy iteration algorithm, which sacrifices convergence to optimality. This algorithm further reduces the size of the controllers at each step by assuming that probability distributions over the other agents' actions are known. While this assumption may not hold in general, it helps produce higher quality solutions in our test problems. |
Kumar, Akshat; Zilberstein, Shlomo Point-Based Backup for Decentralized POMDPs: Complexity and New Algorithms Conference Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Toronto, Canada, 2010. @conference{SZ:KZaamas10, Decentralized POMDPs provide an expressive framework for sequential multi-agent decision making. Despite their high complexity, there has been significant progress in scaling up existing algorithms, largely due to the use of point-based methods. Performing point-based backup is a fundamental operation in state-of-the-art algorithms. We show that even a single backup step in the multi-agent setting is NP-Complete. Despite this negative worst-case result, we present an efficient and scalable optimal algorithm as well as a principled approximation scheme. The optimal algorithm exploits recent advances in the weighted CSP literature to overcome the complexity of the backup operation. The polytime approximation scheme provides a constant factor approximation guarantee based on the number of belief points. In experiments on standard domains, the optimal approach provides significant speedup (up to 2 orders of magnitude) over the previous best optimal algorithm and is able to increase the number of belief points by more than a factor of 3. The approximation scheme also works well in practice, providing near-optimal solutions to the backup problem. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Point-Based Policy Generation for Decentralized POMDPs Conference Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Toronto, Canada, 2010. @conference{SZ:WZCaamas10, Memory-bounded techniques have shown great promise in solving complex multi-agent planning problems modeled as DEC-POMDPs. Much of the performance gains can be attributed to pruning techniques that alleviate the complexity of the exhaustive backup step of the original MBDP algorithm. Despite these improvements, state-of-the-art algorithms can still handle a relative small pool of candidate policies, which limits the quality of the solution in some benchmark problems. We present a new algorithm, Point-Based Policy Generation, which avoids altogether searching the entire joint policy space. The key observation is that the best joint policy for each reachable belief state can be constructed directly, instead of producing first a large set of candidates. We also provide an efficient approximate implementation of this operation. The experimental results show that our solution technique improves the performance significantly in terms of both runtime and solution quality. |
Kumar, Akshat; Zilberstein, Shlomo Anytime Planning for Decentralized POMDPs using Expectation Maximization Conference Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI), Catalina Island, California, 2010. @conference{SZ:KZuai10, Decentralized POMDPs provide an expressive framework for multi-agent sequential decision making. While finite-horizon DEC-POMDPs have enjoyed significant success, progress remains slow for the infinite-horizon case mainly due to the inherent complexity of optimizing stochastic controllers representing agent policies. We present a promising new class of algorithms for the infinite-horizon case, which recasts the optimization problem as inference in a mixture of DBNs. An attractive feature of this approach is the straightforward adoption of existing inference techniques in DBNs for solving DEC-POMDPs and supporting richer representations such as factored or continuous states and actions. We also derive the Expectation Maximization (EM) algorithm to optimize the joint policy represented as DBNs. Experiments on benchmark domains show that EM compares favorably against the state-of-the-art solvers. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Rollout Sampling Policy Iteration for Decentralized POMDPs Conference Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI), Catalina Island, California, 2010. @conference{SZ:WZCuai10, We present decentralized rollout sampling policy iteration (DecRSPI)--a new algorithm for multiagent decision problems formalized as DEC-POMDPs. DecRSPI is designed to improve scalability and tackle problems that lack an explicit model. The algorithm uses Monte-Carlo methods to generate a sample of reachable belief states. Then it computes a joint policy for each belief state based on the rollout estimations. A new policy representation allows us to represent solutions compactly. The key benefits of the algorithm are its linear time complexity over the number of agents, its bounded memory usage and good solution quality. It can solve larger problems that are intractable for existing planning algorithms. Experimental results confirm the effectiveness and scalability of the approach. |
Amato, Christopher; Bonet, Blai; Zilberstein, Shlomo Finite-State Controllers Based on Mealy Machines for Centralized and Decentralized POMDPs Conference Proceedings of the 24th Conference on Artificial Intelligence (AAAI), Atlanta, Georgia, 2010. @conference{SZ:ABZaaai10, Existing controller-based approaches for centralized and decentralized POMDPs are based on automata with output known as Moore machines. In this paper, we show that several advantages can be gained by utilizing another type of automata, the Mealy machine. Mealy machines are more powerful than Moore machines, provide a richer structure that can be exploited by solution methods, and can be easily incorporated into current controller-based approaches. To demonstrate this, we adapted some existing controller-based algorithms to use Mealy machines and obtained results on a set of benchmark domains. The Mealy-based approach always outperformed the Moore-based approach and often outperformed the state-of-the-art algorithms for both centralized and decentralized POMDPs. These findings provide fresh and general insights for the improvement of existing algorithms and the development of new ones. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Trial-Based Dynamic Programming for Multi-Agent Planning Conference Proceedings of the 24th Conference on Artificial Intelligence (AAAI), Atlanta, Georgia, 2010. @conference{SZ:WZCaaai10, Trial-based approaches offer an efficient way to solve single-agent MDPs and POMDPs. These approaches allow agents to focus their computations on regions of the environment they encounter during the trials, leading to significant computational savings. We present a novel trial-based dynamic programming (TBDP) algorithm for DEC-POMDPs that extends these benefits to multi-agent settings. The algorithm uses trial-based methods for both belief generation and policy evaluation. Policy improvement is implemented efficiently using linear programming and a sub-policy reuse technique that helps bound the amount of memory. The results show that TBDP can produce significant value improvements and is much faster than the best existing planning algorithms. |
Bernstein, Daniel S; Amato, Christopher; Hansen, Eric A; Zilberstein, Shlomo Policy Iteration for Decentralized Control of Markov Decision Processes Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 34, pp. 89–132, 2009. @article{SZ:BAHZjair09, Coordination of distributed agents is required for problems arising in many areas, including multi-robot systems, networking and e-commerce. As a formal framework for such problems, we use the decentralized partially observable Markov decision process (DEC-POMDP). Though much work has been done on optimal dynamic programming algorithms for the single-agent version of the problem, optimal algorithms for the multiagent case have been elusive. The main contribution of this paper is an optimal policy iteration algorithm for solving DEC-POMDPs. The algorithm uses stochastic finite-state controllers to represent policies. The solution can include a correlation device, which allows agents to correlate their actions without communicating. This approach alternates between expanding the controller and performing value-preserving transformations, which modify the controller without sacrificing value. We present two Efficient value-preserving transformations: one can reduce the size of the controller and the other can improve its value while keeping the size fixed. Empirical results demonstrate the usefulness of value-preserving transformations in increasing value while keeping controller size to a minimum. To broaden the applicability of the approach, we also present a heuristic version of the policy iteration algorithm, which sacrifices convergence to optimality. This algorithm further reduces the size of the controllers at each step by assuming that probability distributions over the other agents' actions are known. While this assumption may not hold in general, it helps produce higher quality solutions in our test problems. |
Petrik, Marek; Zilberstein, Shlomo A Bilinear Programming Approach for Multiagent Planning Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 35, pp. 235–274, 2009. @article{SZ:PZjair09, Multiagent planning and coordination problems are common and known to be computationally hard. We show that a wide range of two-agent problems can be formulated as bilinear programs. We present a successive approximation algorithm that significantly outperforms the coverage set algorithm, which is the state-of-the-art method for this class of multiagent problems. Because the algorithm is formulated for bilinear programs, it is more general and simpler to implement. The new algorithm can be terminated at any time and--unlike the coverage set algorithm--it facilitates the derivation of a useful online performance bound. It is also much more efficient, on average reducing the computation time of the optimal solution by about four orders of magnitude. Finally, we introduce an automatic dimensionality reduction method that improves the effectiveness of the algorithm, extending its applicability to new domains and providing a new way to analyze a subclass of bilinear programs. |
Becker, Raphen; Carlin, Alan; Lesser, Victor; Zilberstein, Shlomo Analyzing Myopic Approaches for Multi-Agent Communication Journal Article In: Computational Intelligence, vol. 25, no. 1, pp. 31–50, 2009. @article{SZ:BCLZci09, Choosing when to communicate is a fundamental problem in multi-agent systems. This problem becomes particularly challenging when communication is constrained and each agent has different partial information about the overall situation. We take a decision-theoretic approach to this problem that balances the benefits of communication against the costs. Although computing the exact value of communication is intractable, it can be estimated using a standard myopic assumption--that communication is only possible at the present time. We examine specific situations in which this assumption leads to poor performance and demonstrate an alternative approach that relaxes the assumption and improves performance. The results provide an effective method for value-driven communication policies in multi-agent systems. |
Amato, Christopher; Zilberstein, Shlomo Achieving Goals in Decentralized POMDPs Conference Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Budapest, Hungary, 2009. @conference{ASZ:Zaamas09, Coordination of multiple agents under uncertainty in the decentralized POMDP model is known to be NEXP-complete, even when the agents have a joint set of goals. Nevertheless, we show that the existence of goals can help develop effective planning algorithms. We examine an approach to model these problems as indefinite-horizon decentralized POMDPs, suitable for many practical problems that terminate after some unspecified number of steps. Our algorithm for solving these problems is optimal under some common assumptions--that terminal actions exist for each agent and rewards for non-terminal actions are negative. We also propose an infinite-horizon approximation method that allows us to relax these assumptions while maintaining goal conditions. An optimality bound is developed for this sample-based approach and experimental results show that it is able to exploit the goal structure effectively. Compared with the state-of-the-art, our approach can solve larger problems and produce significantly better solutions. |
Kumar, Akshat; Zilberstein, Shlomo Constraint-Based Dynamic Programming for Decentralized POMDPs with Structured Interactions Conference Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Budapest, Hungary, 2009. @conference{SZ:KZaamas09, Decentralized partially observable MDPs (DEC-POMDPs) provide a rich framework for modeling decision making by a team of agents. Despite rapid progress in this area, the limited scalability of solution techniques has restricted the applicability of the model. To overcome this computational barrier, research has focused on restricted classes of DEC-POMDPs, which are easier to solve yet rich enough to capture many practical problems. We present CBDP, an efficient and scalable point-based dynamic programming algorithm for one such model called ND-POMDP (Network Distributed POMDP). Specifically, CBDP provides magnitudes of speedup in the policy computation and generates better quality solution for all test instances. It has linear complexity in the number of agents and horizon length. Furthermore, the complexity per horizon for the examined class of problems is exponential only in a small parameter that depends upon the interaction among the agents, achieving significant scalability for large, loosely coupled multi-agent systems. The efficiency of CBDP lies in exploiting the structure of interactions using constraint networks. These results extend significantly the effectiveness of decision-theoretic planning in multi-agent settings. |
Kumar, Akshat; Zilberstein, Shlomo Dynamic Programming Approximations for Partially Observable Stochastic Games Conference Proceedings of the 22nd International FLAIRS Conference, Sanibel Island, Florida, 2009. @conference{SZ:KZflairs09, Partially observable stochastic games (POSGs) provide a rich mathematical framework for planning under uncertainty by a group of agents. However, this modeling advantage comes with a price, namely a high computational cost. Solving POSGs optimally quickly becomes intractable after a few decision cycles. Our main contribution is to provide bounded approximation techniques, which enable us to scale POSG algorithms by several orders of magnitude. We study both the POSG model and its cooperative counterpart, DEC-POMDP. Experiments on a number of problems confirm the scalability of our approach while still providing useful policies. |
Kumar, Akshat; Zilberstein, Shlomo Event-Detecting Multi-Agent MDPs: Complexity and Constant-Factor Approximation Conference Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI), Pasadena, California, 2009. @conference{SZ:KZijcai09, Planning under uncertainty for multiple agents has grown rapidly with the development of formal models such as multi-agent MDPs and decentralized MDPs. But despite their richness, the applicability of these models remains limited due to their computational complexity. We present the class of event-detecting multi-agent MDPs (eMMDPs), designed to detect multiple mobile targets by a team of sensor agents. We show that eMMDPs are NP-Hard and present a scalable 2-approximation algorithm for solving them using matroid theory and constraint optimization. The complexity of the algorithm is linear in the state-space and number of agents, quadratic in the horizon, and exponential only in a small parameter that depends on the interaction among the agents. Despite the worst-case approximation ratio of 2, experimental results show that the algorithm produces near-optimal policies for a range of test problems. |
Amato, Christopher; Dibangoye, Jilles Steeve; Zilberstein, Shlomo Incremental Policy Generation for Finite-Horizon DEC-POMDPs Conference Proceedings of the 19th International Conference on Automated Planning and Scheduling (ICAPS), Thessaloniki, Greece, 2009. @conference{SZ:ADZicaps09, Decentralized partially observable MDPs (DEC-POMDPs) provide a rich framework for modeling decision making by a team of agents. Despite rapid progress in this area, the limited scalability of solution techniques has restricted the applicability of the model. To overcome this computational barrier, research has focused on restricted classes of DEC-POMDPs, which are easier to solve yet rich enough to capture many practical problems. We present CBDP, an efficient and scalable point-based dynamic programming algorithm for one such model called ND-POMDP (Network Distributed POMDP). Specifically, CBDP provides magnitudes of speedup in the policy computation and generates better quality solution for all test instances. It has linear complexity in the number of agents and horizon length. Furthermore, the complexity per horizon for the examined class of problems is exponential only in a small parameter that depends upon the interaction among the agents, achieving significant scalability for large, loosely coupled multi-agent systems. The efficiency of CBDP lies in exploiting the structure of interactions using constraint networks. These results extend significantly the effectiveness of decision-theoretic planning in multi-agent settings. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Multi-Agent Online Planning with Communication Conference Proceedings of the 19th International Conference on Automated Planning and Scheduling (ICAPS), Thessaloniki, Greece, 2009. @conference{SZ:WZCicaps09, We propose an online algorithm for planning under uncertainty in multi-agent settings modeled as DEC-POMDPs. The algorithm helps overcome the high computational complexity of solving such problems off-line. The key challenge is to produce coordinated behavior using little or no communication. When communication is allowed but constrained, the challenge is to produce high value with minimal communication. The algorithm addresses these challenges by communicating only when history inconsistency is detected, allowing communication to be postponed if necessary. Moreover, it bounds the memory usage at each step and can be applied to problems with arbitrary horizons. The experimental results confirm that the algorithm can solve problems that are too large for the best existing off-line planning algorithms and it outperforms the best online method, producing higher value with much less communication in most cases. |
Allen, Martin; Zilberstein, Shlomo Complexity of Decentralized Control: Special Cases Conference Proceedings of the 23rd Neural Information Processing Systems Conference (NIPS), Vancouver, British Columbia, Canada, 2009. @conference{SZ:AZnips09, The worst-case complexity of general decentralized POMDPs, which are equivalent to partially observable stochastic games (POSGs) is very high, both for the cooperative and competitive cases. Some reductions in complexity have been achieved by exploiting independence relations in some models. We show that these results are somewhat limited: when these independence assumptions are relaxed in very small ways, complexity returns to that of the general case. |
Goldman, Claudia V; Zilberstein, Shlomo Communication-Based Decomposition Mechanisms for Decentralized MDPs Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 32, pp. 169–202, 2008. @article{SZ:GZjair08, Multi-agent planning in stochastic environments can be framed formally as a decentralized Markov decision problem. Many real-life distributed problems that arise in manufacturing, multi-robot coordination and information gathering scenarios can be formalized using this framework. However, finding the optimal solution in the general case is hard, limiting the applicability of recently developed algorithms. This paper provides a practical approach for solving decentralized control problems when communication among the decision makers is possible, but costly. We develop the notion of communication-based mechanism that allows us to decompose a decentralized MDP into multiple single-agent problems. In this framework, referred to as decentralized semi-Markov decision process with direct communication (Dec-SMDP-Com), agents operate separately between communications. We show that finding an optimal mechanism is equivalent to solving optimally a Dec-SMDP-Com. We also provide a heuristic search algorithm that converges on the optimal decomposition. Restricting the decomposition to some specific types of local behaviors reduces significantly the complexity of planning. In particular, we present a polynomial-time algorithm for the case in which individual agents perform goal-oriented behaviors between communications. The paper concludes with an additional tractable algorithm that enables the introduction of human knowledge, thereby reducing the overall problem to finding the best time to communicate. Empirical results show that these approaches provide good approximate solutions. |
Seuken, Sven; Zilberstein, Shlomo Formal Models and Algorithms for Decentralized Decision Making under Uncertainty Journal Article In: Autonomous Agents and Multi-Agent Systems (JAAMAS), vol. 17, no. 2, pp. 190–250, 2008. @article{SZ:SZjaamas08, Multi-agent planning in stochastic environments can be framed formally as a decentralized Markov decision problem. Many real-life distributed problems that arise in manufacturing, multi-robot coordination and information gathering scenarios can be formalized using this framework. However, finding the optimal solution in the general case is hard, limiting the applicability of recently developed algorithms. This paper provides a practical approach for solving decentralized control problems when communication among the decision makers is possible, but costly. We develop the notion of communication-based mechanism that allows us to decompose a decentralized MDP into multiple single-agent problems. In this framework, referred to as decentralized semi-Markov decision process with direct communication (Dec-SMDP-Com), agents operate separately between communications. We show that finding an optimal mechanism is equivalent to solving optimally a Dec-SMDP-Com. We also provide a heuristic search algorithm that converges on the optimal decomposition. Restricting the decomposition to some specific types of local behaviors reduces significantly the complexity of planning. In particular, we present a polynomial-time algorithm for the case in which individual agents perform goal-oriented behaviors between communications. The paper concludes with an additional tractable algorithm that enables the introduction of human knowledge, thereby reducing the overall problem to finding the best time to communicate. Empirical results show that these approaches provide good approximate solutions. |
Petrik, Marek; Zilberstein, Shlomo A Successive Approximation Algorithm for Coordination Problems Conference Proceedings of the 10th International Symposium on Artificial Intelligence and Mathematics (ISAIM), Ft. Lauderdale, Florida, 2008. @conference{SZ:PZisaim08, Developing scalable coordination algorithms for multi-agent systems is a hard computational challenge. One useful approach, demonstrated by the Coverage Set Algorithm (CSA), exploits structured interaction to produce significant computational gains. Empirically, CSA exhibits very good anytime performance, but an error bound on the results has not been established. We reformulate the algorithm and derive an online error bound for approximate solutions. Moreover, we propose an effective way to automatically reduce the complexity of the interaction. Our experiments show that this is a promising approach to solve a broad class of decentralized decision problems. The general formulation used by the algorithm makes it both easy to implement and widely applicable to a variety of other AI problems. |
Carlin, Alan; Zilberstein, Shlomo Value-Based Observation Compression for DEC-POMDPs Conference Proceedings of the 7th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Estoril, Portugal, 2008. @conference{SZ:CZaamas08, Representing agent policies compactly is essential for improving the scalability of multi-agent planning algorithms. In this paper, we focus on developing a pruning technique that allows us to merge certain observations within agent policies, while minimizing loss of value. This is particularly important for solving finite-horizon decentralized POMDPs, where agent policies are represented as trees, and where the size of policy trees grows exponentially with the number of observations. We introduce a value-based observation compression technique that prunes the least valuable observations while maintaining an error bound on the value lost as a result of pruning. We analyze the characteristics of this pruning strategy and show empirically that it is effective. Thus, we use compact policies to obtain significantly higher values compared with the best existing DEC-POMDP algorithm. |
Carlin, Alan; Zilberstein, Shlomo Observation Compression in DEC-POMDP Policy Trees Conference AAMAS Workshop on Multi-Agent Sequential Decision Making in Uncertain Domains (MSDM), Estoril, Portugal, 2008, (Best Paper Award). @conference{SZ:CZmsdm08, Representing agent policies compactly is essential for improving the scalability of multi-agent planning algorithms. In this paper, we focus on developing a pruning technique that allows us to merge certain observations from agent policies, while minimizing the loss of value. This is particularly important for solving finite-horizon decentralized POMDPs, where agent policies are represented as trees, and where the size of policy trees grows exponentially with the number of observations. We introduce a value-based observation compression technique that prunes the least valuable observations while maintaining an error bound on the value lost as a result of pruning. We analyze the characteristics of this pruning strategy and show empirically that it is effective. Thus, we use compact policies to obtain significantly higher values compared with the best existing DEC-POMDP algorithm. |
Amato, Christopher; Zilberstein, Shlomo What's Worth Memorizing: Attribute-based Planning for DEC-POMDPs Conference ICAPS Workshop on Multiagent Planning, Sydney, Australia, 2008. @conference{SZ:AZmasplan08, Current algorithms for decentralized partially observable Markov decision processes (DEC-POMDPs) require a large amount of memory to produce high quality plans. To combat this, existing methods optimize a set of finite-state controllers with an arbitrary amount of fixed memory. While this works well for some problems, in general, scalability and solution quality remain limited. As an alternative, we propose remembering some attributes that summarize key aspects of an agent's action and observation history. These attributes are often simple to determine, provide a well-motivated choice of controller size and focus the solution search on important components of agent histories. We show that for a range of DEC-POMDPs such attribute-based representation improves plan quality and scalability. |
Goldman, Claudia V; Allen, Martin; Zilberstein, Shlomo Learning to Communicate in a Decentralized Environment Journal Article In: Autonomous Agents and Multi-Agent Systems (JAAMAS), vol. 15, no. 1, pp. 47–90, 2007. @article{SZ:GAZjaamas07, Learning to communicate is an emerging challenge in AI research. It is known that agents interacting in decentralized, stochastic environments can benefit from exchanging information. Multi-agent planning generally assumes that agents share a common means of communication; however, in building robust distributed systems it is important to address potential miscoordination resulting from misinterpretation of messages exchanged. This paper lays foundations for studying this problem, examining its properties analytically and empirically in a decision-theoretic context. We establish a formal framework for the problem, and identify a collection of necessary and sufficient properties for decision problems that allow agents to employ probabilistic updating schemes in order to learn how to interpret what others are communicating. Solving the problem optimally is often intractable, but our approach enables agents using different languages to converge upon coordination over time. Our experimental work establishes how these methods perform when applied to problems of varying complexity. |
Seuken, Sven; Zilberstein, Shlomo Memory-Bounded Dynamic Programming for DEC-POMDPs Conference Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI), Hyderabad, India, 2007. @conference{SZ:SZijcai07, Decentralized decision making under uncertainty has been shown to be intractable when each agent has different partial information about the domain. Thus, improving the applicability and scalability of planning algorithms is an important challenge. We present the first memory-bounded dynamic programming algorithm for finite-horizon decentralized POMDPs. A set of heuristics is used to identify relevant points of the infinitely large belief space. Using these belief points, the algorithm successively selects the best joint policies for each horizon. The algorithm is extremely efficient, having linear time and space complexity with respect to the horizon length. Experimental results show that it can handle horizons that are multiple orders of magnitude larger than what was previously possible, while achieving the same or better solution quality. These results significantly increase the applicability of decentralized decision-making techniques. |
Amato, Christopher; Bernstein, Daniel S; Zilberstein, Shlomo Optimizing Memory-Bounded Controllers for Decentralized POMDPs Conference Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (UAI), Vancouver, British Columbia, 2007. @conference{SZ:ABZuai07, We present a memory-bounded optimization approach for solving infinite-horizon decentralized POMDPs. Policies for each agent are represented by stochastic finite state controllers. We formulate the problem of optimizing these policies as a nonlinear program, leveraging powerful existing nonlinear optimization techniques for solving the problem. While existing solvers only guarantee locally optimal solutions, we show that our formulation produces higher quality controllers than the state-of-the-art approach. We also incorporate a shared source of randomness in the form of a correlation device to further increase solution quality with only a limited increase in space and time. Our experimental results show that nonlinear optimization can be used to provide high quality, concise solutions to decentralized decision problems under uncertainty. |
Seuken, Sven; Zilberstein, Shlomo Improved Memory-Bounded Dynamic Programming for Decentralized POMDPs Conference Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (UAI), Vancouver, British Columbia, 2007. @conference{SZ:SZuai07, Memory-Bounded Dynamic Programming (MBDP) has proved extremely effective in solving decentralized POMDPs with large horizons. We generalize the algorithm and improve its scalability by reducing the complexity with respect to the number of observations from exponential to polynomial. We derive error bounds on solution quality with respect to this new approximation and analyze the convergence behavior. To evaluate the effectiveness of the improvements, we introduce a new, larger benchmark problem. Experimental results show that despite the high complexity of decentralized POMDPs, scalable solution techniques such as MBDP perform surprisingly well. |
Allen, Martin; Zilberstein, Shlomo Agent Influence as a Predictor of Difficulty for Decentralized Problem-Solving Conference Proceedings of the 22nd Conference on Artificial Intelligence (AAAI), Vancouver, British Columbia, 2007. @conference{SZ:AZaaai07, We study the effect of problem structure on the practical performance of optimal dynamic programming for decentralized decision problems. It is shown that restricting agent influence over problem dynamics can make the problem easier to solve. Experimental results establish that agent influence correlates with problem difficulty: as the gap between the influence of different agents grows, problems tend to become much easier to solve. The measure thus provides a general-purpose, automatic characterization of decentralized problems, identifying those for which optimal methods are more or less likely to work. Such a measure is also of possible use as a heuristic in the design of algorithms that create task decompositions and control hierarchies in order to simplify multiagent problems. |
Petrik, Marek; Zilberstein, Shlomo Anytime Coordination Using Separable Bilinear Programs Conference Proceedings of the 22nd Conference on Artificial Intelligence (AAAI), Vancouver, British Columbia, 2007. @conference{SZ:PZaaai07, Developing scalable coordination algorithms for multi-agent systems is a hard computational challenge. One useful approach, demonstrated by the Coverage Set Algorithm (CSA), exploits structured interaction to produce significant computational gains. Empirically, CSA exhibits very good anytime performance, but an error bound on the results has not been established. We reformulate the algorithm and derive both online and offline error bounds for approximate solutions. Moreover, we propose an effective way to automatically reduce the complexity of the interaction. Our experiments show that this is a promising approach to solve a broad class of decentralized decision problems. The general formulation used by the algorithm makes it both easy to implement and widely applicable to a variety of other AI problems. |
Szer, Daniel; Charpillet, Francois; Zilberstein, Shlomo MAA*: A Heuristic Search Algorithm for Solving Decentralized POMDPs Conference Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence (UAI), Edinburgh, Scotland, 2005. @conference{SZ:SCZuai05, We present multi-agent A* (MAA*), the first complete and optimal heuristic search algorithm for solving decentralized partially-observable Markov decision problems (DEC- POMDPs) with finite horizon. The algorithm is suitable for computing optimal plans for a cooperative group of agents that operate in a stochastic environment such as multi-robot coordination, network traffic control, or distributed resource allocation. Solving such problems effectively is a major challenge in the area of planning under uncertainty. Our solution is based on a synthesis of classical heuristic search and decentralized control theory. Experimental results show that MAA* has significant advantages. We introduce an anytime variant of MAA* and conclude with a discussion of promising extensions such as an approach to solving infinite-horizon problems. |
Allen, Martin; Goldman, Claudia V; Zilberstein, Shlomo Language Learning in Multi-Agent Systems Conference Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI), Edinburgh, Scotland, 2005. @conference{SZ:AGZijcai05, We present the problem of learning to communicate in decentralized and stochastic environments, analyzing it formally in a decision-theoretic context and illustrating the concept experimentally. Our approach allows agents to converge upon coordinated communication and action over time. |
Bernstein, Daniel S; Hansen, Eric A; Zilberstein, Shlomo Bounded Policy Iteration for Decentralized POMDPs Conference Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI), Edinburgh, Scotland, 2005. @conference{SZ:BHZijcai05, We present a bounded policy iteration algorithm for infinite-horizon decentralized POMDPs. Policies are represented as joint stochastic finite-state controllers, which consist of a local controller for each agent. We also let a joint controller include a correlation device that allows the agents to correlate their behavior without exchanging information during execution, and show that this leads to improved performance. The algorithm uses a fixed amount of memory, and each iteration is guaranteed to produce a controller with value at least as high as the previous one for all possible initial state distributions. For the case of a single agent, the algorithm reduces to Poupart and Boutilier's bounded policy iteration for POMDPs. |
Plan and Activity Recognition
Mahmud, Saaduddin; Nashed, Samer B.; Goldman, Claudia V.; Zilberstein, Shlomo Estimating Causal Responsibility for Explaining Autonomous Behavior Book Section In: Calvaresi, Davide (Ed.): International Workshop on Explainable and Transparent AI and Multi-Agent Systems (EXTRAAMAS), pp. 78–94, Springer, 2023. @incollection{SZ:MNGZextraamas23, There has been growing interest in causal explanations of stochastic, sequential decision-making systems. Structural causal models and causal reasoning offer several theoretical benefits when exact inference can be applied. Furthermore, users overwhelmingly prefer the resulting causal explanations over other state-of-the-art systems. In this work, we focus on one such method, MeanRESP, and its approximate versions that drastically reduce compute load and assign a responsibility score to each variable, which helps identify smaller sets of causes to be used as explanations. However, this method, and its approximate versions in particular, lack deeper theoretical analysis and broader empirical tests. To address these shortcomings, we provide three primary contributions. First, we offer several theoretical insights on the sample complexity and error rate of approximate MeanRESP. Second, we discuss several automated metrics for comparing explanations generated from approximate methods to those generated via exact methods. While we recognize the significance of user studies as the gold standard for evaluating explanations, our aim is to leverage the proposed metrics to systematically compare explanation-generation methods along important quantitative dimensions. Finally, we provide a more detailed discussion of MeanRESP and how its output under different definitions of responsibility compares to existing widely adopted methods that use Shapley values. |
Parr, Shane; Khatri, Ishan; Svegliato, Justin; Zilberstein, Shlomo Agent-Aware State Estimation in Autonomous Vehicles Conference Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 2021. @conference{SZ:PKSZiros21, Autonomous systems often operate in environments where the behavior of multiple agents is coordinated by a shared global state. Reliable estimation of the global state is thus critical for successfully operating in a multi-agent setting. We introduce agent-aware state estimation--a framework for calculating indirect estimations of state given observations of the behavior of other agents in the environment. We also introduce transition-independent agent-aware state estimation--a tractable class of agent-aware state estimation--and show that it allows the speed of inference to scale linearly with the number of agents in the environment. As an example, we model traffic light classification in instances of complete loss of direct observation. By taking into account observations of vehicular behavior from multiple directions of traffic, our approach exhibits accuracy higher than that of existing traffic light-only HMM methods on a real-world autonomous vehicle data set under a variety of simulated occlusion scenarios. |
Wu, Feng; Zilberstein, Shlomo; Jennings, Nicholas R Multi-Agent Planning with High-Level Human Guidance Conference Proceedings of Principles and Practice of Multi-Agent Systems (PRIMA), 2020. @conference{SZ:WZJprima20, Planning and coordination of multiple agents in the presence of uncertainty and noisy sensors is extremely hard. A human operator who observes a multi-agent team can provide valuable guidance to the team based on her superior ability to interpret observations and assess the overall situation. We propose an extension of decentralized POMDPs that allows such human guidance to be factored into the planning and execution processes. Human guidance in our framework consists of intuitive high-level commands that the agents must translate into a suitable joint plan that is sensitive to what they know from local observations. The result is a framework that allows multi-agent systems to benefit from the complex strategic thinking of a human supervising them. We evaluate this approach on several common benchmark problems and show that it can lead to dramatic improvement in performance. |
Wu, Feng; Zilberstein, Shlomo; Jennings, Nicholas R Stochastic Multi-agent Planning with Partial State Models Conference Proceedings of the First International Conference on Distributed Artificial Intelligence (DAI), Beijing, China, 2019. @conference{SZ:WZJdai19, People who observe a multi-agent team can often provide valuable information to the agents based on their superior cognitive abilities to interpret sequences of observations and assess the overall situation. The knowledge they possess is often difficult to be fully represented using a formal model such as DEC-POMDP. To deal with this, we propose an extension of the DEC-POMDP that allows states to be partially specified and benefit from expert knowledge, while preserving the partial observability and decentralized operation of the agents. In particular, we present an algorithm for computing policies based on history samples that include human labeled data in the form of reward reshaping. We also consider ways to minimize the burden on human experts during the labeling phase. The results offer the first approach to incorporating human knowledge in such complex multi-agent settings. We demonstrate the benefits of our approach using a disaster recovery scenario, comparing it to several baseline approaches. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Privacy-Preserving Policy Iteration for Decentralized POMDPs Conference Proceedings of the 32nd Conference on Artificial Intelligence (AAAI), New Orleans, Louisiana, 2018. @conference{SZ:WZCaaai18, We propose the first privacy-preserving approach to address the privacy issues that arise in multi-agent planning problems modeled as a Dec-POMDP. Our solution is a distributed message-passing algorithm based on trials, where the agents' policies are optimized using the cross-entropy method. In our algorithm, the agents' private information is protected using a public-key homomorphic cryptosystem. We prove the correctness of our algorithm and analyze its complexity in terms of message passing and encryption/decryption operations. Furthermore, we analyze several privacy aspects of our algorithm and show that it can preserve the agent privacy of non-neighbors, model privacy, and decision privacy. Our experimental results on several common Dec-POMDP bench- mark problems confirm the effectiveness of our approach. |
Wray, Kyle Hollins; Kumar, Akshat; Zilberstein, Shlomo Integrated Cooperation and Competition in Multi-Agent Decision-Making Conference Proceedings of the 32nd Conference on Artificial Intelligence (AAAI), New Orleans, Louisiana, 2018. @conference{SZ:WKZaaai18, Observing that many real-world sequential decision problems are not purely cooperative or purely competitive, we propose a new model--cooperative-competitive process (CCP)--that can simultaneously encapsulate both cooperation and competition. First, we discuss how the CCP model bridges the gap between cooperative and competitive models. Next, we investigate a specific class of group-dominant CCPs, in which agents cooperate to achieve a common goal as their primary objective, while also pursuing individual goals as a secondary objective. We provide an approximate solution for this class of problems that leverages stochastic finite-state controllers. The model is grounded in two multi-robot meeting and box-pushing domains that are implemented in simulation and demonstrated on two real robots. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Multi-Agent Planning with Baseline Regret Minimization Conference Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI), 2017. @conference{SZ:WZCijcai17, We propose a novel baseline regret minimization algorithm for multi-agent planning problems modeled as finite-horizon decentralized POMDPs. It guarantees to produce a policy that is provably at least as good as a given baseline policy. We also propose an iterative belief generation algorithm to efficiently minimize the baseline regret, which only requires necessary iterations so as to converge to the policy with minimum baseline regret. Experimental results on common benchmark problems confirm the benefits of the algorithm compared with the state-of-the-art approaches. |
Kumar, Akshat; Mostafa, Hala; Zilberstein, Shlomo Dual Formulations for Optimizing Dec-POMDP Controllers Conference Proceedings of the 26th International Conference on Automated Planning and Scheduling (ICAPS), London, UK, 2016. @conference{SZ:KMZicaps16, Decentralized POMDP is an expressive model for multiagent planning. Finite-state controllers (FSCs)--often used to represent policies for infinite-horizon problems---offer a compact, simple-to-execute policy representation. We exploit novel connections between optimizing decentralized FSCs and the dual linear program for MDPs. Consequently, we describe a dual mixed integer linear program (MIP) for optimizing deterministic FSCs. We exploit the Dec-POMDP structure to devise a compact MIP and formulate constraints that result in policies executable in partially-observable decentralized settings. We show analytically that the dual formulation can also be exploited within the expectation maximization (EM) framework to optimize stochastic FSCs. The resulting EM algorithm can be implemented by solving a sequence of linear programs, without requiring expensive message passing over the Dec-POMDP DBN. We also present an efficient technique for policy improvement based on a weighted entropy measure. Compared with state-of-the-art FSC methods, our approach offers over an order-of-magnitude speedup, while producing similar or better solutions. |
Kumar, Akshat; Zilberstein, Shlomo; Toussaint, Marc Probabilistic Inference Techniques for Scalable Multiagent Decision Making Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 53, pp. 223–270, 2015. @article{SZ:KZTjair15, Decentralized POMDPs provide an expressive framework for multiagent sequential decision making. However, the complexity of these models -- NEXP-Complete even for two agents -- has limited their scalability. We present a promising new class of approximation algorithms by developing novel connections between multiagent planning and machine learning. We show how the multiagent planning problem can be reformulated as inference in a mixture of dynamic Bayesian networks (DBNs). This planning-as-inference approach paves the way for the application of efficient inference techniques in DBNs to multiagent decision making. To further improve scalability, we identify certain conditions that are sufficient to extend the approach to multiagent systems with dozens of agents. Specifically, we show that the necessary inference within the expectation-maximization framework can be decomposed into processes that often involve a small subset of agents, thereby facilitating scalability. We further show that a number of existing multiagent planning models satisfy these conditions. Experiments on large planning benchmarks confirm the benefits of our approach in terms of runtime and scalability with respect to existing techniques. |
Nguyen, Duc Thien; Yeoh, William; Lau, Hoong Chuin; Zilberstein, Shlomo; Zhang, Chongjie Decentralized Multi-Agent Reinforcement Learning in Average-Reward Dynamic DCOPs Conference Proceedings of the 28th Conference on Artificial Intelligence (AAAI), Quebec City, Canada, 2014. @conference{SZ:NYLZZaaai14, Researchers have introduced the Dynamic Distributed Constraint Optimization Problem (Dynamic DCOP) formulation to model dynamically changing multi-agent coordination problems, where a dynamic DCOP is a sequence of (static canonical) DCOPs, each partially different from the DCOP preceding it. Existing work typically assumes that the problem in each time step is decoupled from the problems in other time steps, which might not hold in some applications. Therefore, in this paper, we make the following contributions: (i) We introduce a new model, called Markovian Dynamic DCOPs (MD-DCOPs), where the DCOP in the next time step is a function of the value assignments in the current time step; (ii) We introduce two distributed reinforcement learning algorithms, the Distributed RVI Q-learning algorithm and the Distributed R-learning algorithm, that balance exploration and exploitation to solve MD-DCOPs in an online manner; and (iii) We empirically evaluate them against an existing multiarm bandit DCOP algorithm on dynamic DCOPs. |
Brafman, Ronen I; Shani, Guy; Zilberstein, Shlomo Qualitative Planning under Partial Observability in Multi-Agent Domains Conference Proceedings of the 27th Conference on Artificial Intelligence (AAAI), Bellevue, Washington, 2013. @conference{SZ:BSZaaai13, Decentralized POMDPs (Dec-POMDPs) provide a rich, attractive model for planning under uncertainty and partial observability in cooperative multi-agent domains with a growing body of research. In this paper we formulate a qualitative, propositional model for multi-agent planning under uncertainty with partial observability, which we call Qualitative Dec-POMDP (QDec-POMDP). We show that the worst-case complexity of planning in QDec-POMDPs is similar to that of Dec-POMDPs. Still, because the model is more "classical" in nature, it is more compact and easier to specify. Furthermore, it eases the adaptation of methods used in classical and contingent planning to solve problems that challenge current Dec-POMDPs solvers. In particular, in this paper we describe a method based on compilation to classical planning, which handles multi-agent planning problems significantly larger than those handled by current Dec-POMDP algorithms. |
Wu, Feng; Zilberstein, Shlomo; Jennings, Nicholas R Monte-Carlo Expectation Maximization for Decentralized POMDPs Conference Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI), Beijing, China, 2013. @conference{SZ:WZJijcai13, We address two significant drawbacks of state-of-the-art solvers of decentralized POMDPs (DEC-POMDPs): the reliance on complete knowledge of the model and limited scalability as the complexity of the domain grows. We extend a recently proposed approach for solving DEC-POMDPs via a reduction to the maximum likelihood problem, which in turn can be solved using EM. We introduce a model-free version of this approach that employs Monte-Carlo EM (MCEM). While a naive implementation of MCEM is inadequate in multi-agent settings, we introduce several improvements in sampling that produce high-quality results on a variety of DEC-POMDP benchmarks, including large problems with thousands of agents. |
Yeoh, William; Kumar, Akshat; Zilberstein, Shlomo Automated Generation of Interaction Graphs for Value-Factored Dec-POMDPs Conference Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI), Beijing, China, 2013. @conference{SZ:YKZijcai13, The Decentralized Partially Observable Markov Decision Process (Dec-POMDP) is a powerful model for multiagent planning under uncertainty, but its applicability is hindered by its high complexity -- solving Dec-POMDPs optimally is NEXP-hard. Recently, Kumar et al. introduced the Value Factorization (VF) framework, which exploits decomposable value functions that can be factored into subfunctions. This framework has been shown to be a generalization of several models that leverage sparse agent interactions such as TI-Dec-MDPs, ND-POMDPs and TD-POMDPs. Existing algorithms for these models assume that the interaction graph of the problem is given. In this paper, we introduce three algorithms to automatically generate interaction graphs for models within the VF framework and establish lower and upper bounds on the expected reward of an optimal joint policy. We illustrate experimentally the benefits of these techniques for sensor placement in a decentralized tracking application. |
Durfee, Edmund; Zilberstein, Shlomo Multiagent Planning, Control, and Execution Book Section In: Weiss, G (Ed.): Multiagent Systems, Second Edition, pp. 485–546, MIT Press, Cambridge, MA, USA, 2013. @incollection{SZ:DZmultiagent13, |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Online Planning for Multi-Agent Systems with Bounded Communication Journal Article In: Artificial Intelligence (AIJ), vol. 175, no. 2, pp. 487–511, 2011. @article{SZ:WZCaij11, We propose an online algorithm for planning under uncertainty in multi-agent settings modeled as DEC-POMDPs. The algorithm helps overcome the high computational complexity of solving such problems offline. The key challenges in decentralized operation are to maintain coordinated behavior with little or no communication and, when communication is allowed, to optimize value with minimal communication. The algorithm addresses these challenges by generating identical conditional plans based on common knowledge and communicating only when history inconsistency is detected, allowing communication to be postponed when necessary. To be suitable for online operation, the algorithm computes good local policies using a new and fast local search method implemented using linear programming. Moreover, it bounds the amount of memory used at each step and can be applied to problems with arbitrary horizons. The experimental results confirm that the algorithm can solve problems that are too large for the best existing offline planning algorithms and it outperforms the best online method, producing much higher value with much less communication in most cases. The algorithm also proves to be effective when the communication channel is imperfect (periodically unavailable). These results contribute to the scalability of decision-theoretic planning in multi-agent settings. |
Kumar, Akshat; Zilberstein, Shlomo Message-Passing Algorithms for Large Structured Decentralized POMDPs Conference Proceedings of the 10th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Taipei, Taiwan, 2011. @conference{SZ:KZaamas11, Anytime algorithms allow a system to trade solution quality for computation time. In previous work, monitoring techniques have been developed to allow agents to stop the computation at the "right" time so as to optimize a given time-dependent utility function. However, these results apply only to the single-agent case. In this paper we analyze the problems that arise when several agents solve components of a larger problem, each using an anytime algorithm. Monitoring in this case is more challenging as each agent is uncertain about the progress made so far by the others. We develop a formal framework for decentralized monitoring, establish the complexity of several interesting variants of the problem, and propose solution techniques for each one. Finally, we show that the framework can be applied to decentralized flow and planning problems. |
Kumar, Akshat; Zilberstein, Shlomo; Toussaint, Marc Scalable Multiagent Planning Using Probabilistic Inference Conference Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI), Barcelona, Spain, 2011. @conference{SZ:KZTijcai11, Multiagent planning has seen much progress with the development of formal models such as Dec-POMDPs. However, the complexity of these models -- NEXP-Complete even for two agents -- has limited scalability. We identify certain mild conditions that are sufficient to make multiagent planning amenable to a scalable approximation w.r.t. the number of agents. This is achieved by constructing a graphical model in which likelihood maximization is equivalent to plan optimization. Using the Expectation-Maximization framework for likelihood maximization, we show that the necessary inference can be decomposed into processes that often involve a small subset of agents, thereby facilitating scalability. We derive a global update rule that combines these local inferences to monotonically increase the overall solution quality. Experiments on a large multiagent planning benchmark confirm the benefits of the new approach in terms of runtime and scalability. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Online Planning for Ad Hoc Autonomous Agent Teams Conference Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI), Barcelona, Spain, 2011. @conference{SZ:WZCijcai11, We propose a novel online planning algorithm for ad hoc team settings -- challenging situations in which an agent must collaborate with unknown teammates without prior coordination. Our approach is based on constructing and solving a series of stage games, and then using biased adaptive play to choose actions. The utility function in each stage game is estimated via Monte-Carlo tree search using the UCT algorithm. We establish analytically the convergence of the algorithm and show that it performs well in a variety of ad hoc team domains. |
Amato, Christopher; Bernstein, Daniel S; Zilberstein, Shlomo Optimizing Fixed-Size Stochastic Controllers for POMDPs and Decentralized POMDPs Journal Article In: Autonomous Agents and Multi-Agent Systems (JAAMAS), vol. 21, no. 3, pp. 293–320, 2010. @article{SZ:ABZjaamas10, Coordination of distributed agents is required for problems arising in many areas, including multi-robot systems, networking and e-commerce. As a formal framework for such problems, we use the decentralized partially observable Markov decision process (DEC-POMDP). Though much work has been done on optimal dynamic programming algorithms for the single-agent version of the problem, optimal algorithms for the multiagent case have been elusive. The main contribution of this paper is an optimal policy iteration algorithm for solving DEC-POMDPs. The algorithm uses stochastic finite-state controllers to represent policies. The solution can include a correlation device, which allows agents to correlate their actions without communicating. This approach alternates between expanding the controller and performing value-preserving transformations, which modify the controller without sacrificing value. We present two Efficient value-preserving transformations: one can reduce the size of the controller and the other can improve its value while keeping the size fixed. Empirical results demonstrate the usefulness of value-preserving transformations in increasing value while keeping controller size to a minimum. To broaden the applicability of the approach, we also present a heuristic version of the policy iteration algorithm, which sacrifices convergence to optimality. This algorithm further reduces the size of the controllers at each step by assuming that probability distributions over the other agents' actions are known. While this assumption may not hold in general, it helps produce higher quality solutions in our test problems. |
Kumar, Akshat; Zilberstein, Shlomo Point-Based Backup for Decentralized POMDPs: Complexity and New Algorithms Conference Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Toronto, Canada, 2010. @conference{SZ:KZaamas10, Decentralized POMDPs provide an expressive framework for sequential multi-agent decision making. Despite their high complexity, there has been significant progress in scaling up existing algorithms, largely due to the use of point-based methods. Performing point-based backup is a fundamental operation in state-of-the-art algorithms. We show that even a single backup step in the multi-agent setting is NP-Complete. Despite this negative worst-case result, we present an efficient and scalable optimal algorithm as well as a principled approximation scheme. The optimal algorithm exploits recent advances in the weighted CSP literature to overcome the complexity of the backup operation. The polytime approximation scheme provides a constant factor approximation guarantee based on the number of belief points. In experiments on standard domains, the optimal approach provides significant speedup (up to 2 orders of magnitude) over the previous best optimal algorithm and is able to increase the number of belief points by more than a factor of 3. The approximation scheme also works well in practice, providing near-optimal solutions to the backup problem. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Point-Based Policy Generation for Decentralized POMDPs Conference Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Toronto, Canada, 2010. @conference{SZ:WZCaamas10, Memory-bounded techniques have shown great promise in solving complex multi-agent planning problems modeled as DEC-POMDPs. Much of the performance gains can be attributed to pruning techniques that alleviate the complexity of the exhaustive backup step of the original MBDP algorithm. Despite these improvements, state-of-the-art algorithms can still handle a relative small pool of candidate policies, which limits the quality of the solution in some benchmark problems. We present a new algorithm, Point-Based Policy Generation, which avoids altogether searching the entire joint policy space. The key observation is that the best joint policy for each reachable belief state can be constructed directly, instead of producing first a large set of candidates. We also provide an efficient approximate implementation of this operation. The experimental results show that our solution technique improves the performance significantly in terms of both runtime and solution quality. |
Kumar, Akshat; Zilberstein, Shlomo Anytime Planning for Decentralized POMDPs using Expectation Maximization Conference Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI), Catalina Island, California, 2010. @conference{SZ:KZuai10, Decentralized POMDPs provide an expressive framework for multi-agent sequential decision making. While finite-horizon DEC-POMDPs have enjoyed significant success, progress remains slow for the infinite-horizon case mainly due to the inherent complexity of optimizing stochastic controllers representing agent policies. We present a promising new class of algorithms for the infinite-horizon case, which recasts the optimization problem as inference in a mixture of DBNs. An attractive feature of this approach is the straightforward adoption of existing inference techniques in DBNs for solving DEC-POMDPs and supporting richer representations such as factored or continuous states and actions. We also derive the Expectation Maximization (EM) algorithm to optimize the joint policy represented as DBNs. Experiments on benchmark domains show that EM compares favorably against the state-of-the-art solvers. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Rollout Sampling Policy Iteration for Decentralized POMDPs Conference Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI), Catalina Island, California, 2010. @conference{SZ:WZCuai10, We present decentralized rollout sampling policy iteration (DecRSPI)--a new algorithm for multiagent decision problems formalized as DEC-POMDPs. DecRSPI is designed to improve scalability and tackle problems that lack an explicit model. The algorithm uses Monte-Carlo methods to generate a sample of reachable belief states. Then it computes a joint policy for each belief state based on the rollout estimations. A new policy representation allows us to represent solutions compactly. The key benefits of the algorithm are its linear time complexity over the number of agents, its bounded memory usage and good solution quality. It can solve larger problems that are intractable for existing planning algorithms. Experimental results confirm the effectiveness and scalability of the approach. |
Amato, Christopher; Bonet, Blai; Zilberstein, Shlomo Finite-State Controllers Based on Mealy Machines for Centralized and Decentralized POMDPs Conference Proceedings of the 24th Conference on Artificial Intelligence (AAAI), Atlanta, Georgia, 2010. @conference{SZ:ABZaaai10, Existing controller-based approaches for centralized and decentralized POMDPs are based on automata with output known as Moore machines. In this paper, we show that several advantages can be gained by utilizing another type of automata, the Mealy machine. Mealy machines are more powerful than Moore machines, provide a richer structure that can be exploited by solution methods, and can be easily incorporated into current controller-based approaches. To demonstrate this, we adapted some existing controller-based algorithms to use Mealy machines and obtained results on a set of benchmark domains. The Mealy-based approach always outperformed the Moore-based approach and often outperformed the state-of-the-art algorithms for both centralized and decentralized POMDPs. These findings provide fresh and general insights for the improvement of existing algorithms and the development of new ones. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Trial-Based Dynamic Programming for Multi-Agent Planning Conference Proceedings of the 24th Conference on Artificial Intelligence (AAAI), Atlanta, Georgia, 2010. @conference{SZ:WZCaaai10, Trial-based approaches offer an efficient way to solve single-agent MDPs and POMDPs. These approaches allow agents to focus their computations on regions of the environment they encounter during the trials, leading to significant computational savings. We present a novel trial-based dynamic programming (TBDP) algorithm for DEC-POMDPs that extends these benefits to multi-agent settings. The algorithm uses trial-based methods for both belief generation and policy evaluation. Policy improvement is implemented efficiently using linear programming and a sub-policy reuse technique that helps bound the amount of memory. The results show that TBDP can produce significant value improvements and is much faster than the best existing planning algorithms. |
Bernstein, Daniel S; Amato, Christopher; Hansen, Eric A; Zilberstein, Shlomo Policy Iteration for Decentralized Control of Markov Decision Processes Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 34, pp. 89–132, 2009. @article{SZ:BAHZjair09, Coordination of distributed agents is required for problems arising in many areas, including multi-robot systems, networking and e-commerce. As a formal framework for such problems, we use the decentralized partially observable Markov decision process (DEC-POMDP). Though much work has been done on optimal dynamic programming algorithms for the single-agent version of the problem, optimal algorithms for the multiagent case have been elusive. The main contribution of this paper is an optimal policy iteration algorithm for solving DEC-POMDPs. The algorithm uses stochastic finite-state controllers to represent policies. The solution can include a correlation device, which allows agents to correlate their actions without communicating. This approach alternates between expanding the controller and performing value-preserving transformations, which modify the controller without sacrificing value. We present two Efficient value-preserving transformations: one can reduce the size of the controller and the other can improve its value while keeping the size fixed. Empirical results demonstrate the usefulness of value-preserving transformations in increasing value while keeping controller size to a minimum. To broaden the applicability of the approach, we also present a heuristic version of the policy iteration algorithm, which sacrifices convergence to optimality. This algorithm further reduces the size of the controllers at each step by assuming that probability distributions over the other agents' actions are known. While this assumption may not hold in general, it helps produce higher quality solutions in our test problems. |
Petrik, Marek; Zilberstein, Shlomo A Bilinear Programming Approach for Multiagent Planning Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 35, pp. 235–274, 2009. @article{SZ:PZjair09, Multiagent planning and coordination problems are common and known to be computationally hard. We show that a wide range of two-agent problems can be formulated as bilinear programs. We present a successive approximation algorithm that significantly outperforms the coverage set algorithm, which is the state-of-the-art method for this class of multiagent problems. Because the algorithm is formulated for bilinear programs, it is more general and simpler to implement. The new algorithm can be terminated at any time and--unlike the coverage set algorithm--it facilitates the derivation of a useful online performance bound. It is also much more efficient, on average reducing the computation time of the optimal solution by about four orders of magnitude. Finally, we introduce an automatic dimensionality reduction method that improves the effectiveness of the algorithm, extending its applicability to new domains and providing a new way to analyze a subclass of bilinear programs. |
Becker, Raphen; Carlin, Alan; Lesser, Victor; Zilberstein, Shlomo Analyzing Myopic Approaches for Multi-Agent Communication Journal Article In: Computational Intelligence, vol. 25, no. 1, pp. 31–50, 2009. @article{SZ:BCLZci09, Choosing when to communicate is a fundamental problem in multi-agent systems. This problem becomes particularly challenging when communication is constrained and each agent has different partial information about the overall situation. We take a decision-theoretic approach to this problem that balances the benefits of communication against the costs. Although computing the exact value of communication is intractable, it can be estimated using a standard myopic assumption--that communication is only possible at the present time. We examine specific situations in which this assumption leads to poor performance and demonstrate an alternative approach that relaxes the assumption and improves performance. The results provide an effective method for value-driven communication policies in multi-agent systems. |
Amato, Christopher; Zilberstein, Shlomo Achieving Goals in Decentralized POMDPs Conference Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Budapest, Hungary, 2009. @conference{ASZ:Zaamas09, Coordination of multiple agents under uncertainty in the decentralized POMDP model is known to be NEXP-complete, even when the agents have a joint set of goals. Nevertheless, we show that the existence of goals can help develop effective planning algorithms. We examine an approach to model these problems as indefinite-horizon decentralized POMDPs, suitable for many practical problems that terminate after some unspecified number of steps. Our algorithm for solving these problems is optimal under some common assumptions--that terminal actions exist for each agent and rewards for non-terminal actions are negative. We also propose an infinite-horizon approximation method that allows us to relax these assumptions while maintaining goal conditions. An optimality bound is developed for this sample-based approach and experimental results show that it is able to exploit the goal structure effectively. Compared with the state-of-the-art, our approach can solve larger problems and produce significantly better solutions. |
Kumar, Akshat; Zilberstein, Shlomo Constraint-Based Dynamic Programming for Decentralized POMDPs with Structured Interactions Conference Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Budapest, Hungary, 2009. @conference{SZ:KZaamas09, Decentralized partially observable MDPs (DEC-POMDPs) provide a rich framework for modeling decision making by a team of agents. Despite rapid progress in this area, the limited scalability of solution techniques has restricted the applicability of the model. To overcome this computational barrier, research has focused on restricted classes of DEC-POMDPs, which are easier to solve yet rich enough to capture many practical problems. We present CBDP, an efficient and scalable point-based dynamic programming algorithm for one such model called ND-POMDP (Network Distributed POMDP). Specifically, CBDP provides magnitudes of speedup in the policy computation and generates better quality solution for all test instances. It has linear complexity in the number of agents and horizon length. Furthermore, the complexity per horizon for the examined class of problems is exponential only in a small parameter that depends upon the interaction among the agents, achieving significant scalability for large, loosely coupled multi-agent systems. The efficiency of CBDP lies in exploiting the structure of interactions using constraint networks. These results extend significantly the effectiveness of decision-theoretic planning in multi-agent settings. |
Kumar, Akshat; Zilberstein, Shlomo Dynamic Programming Approximations for Partially Observable Stochastic Games Conference Proceedings of the 22nd International FLAIRS Conference, Sanibel Island, Florida, 2009. @conference{SZ:KZflairs09, Partially observable stochastic games (POSGs) provide a rich mathematical framework for planning under uncertainty by a group of agents. However, this modeling advantage comes with a price, namely a high computational cost. Solving POSGs optimally quickly becomes intractable after a few decision cycles. Our main contribution is to provide bounded approximation techniques, which enable us to scale POSG algorithms by several orders of magnitude. We study both the POSG model and its cooperative counterpart, DEC-POMDP. Experiments on a number of problems confirm the scalability of our approach while still providing useful policies. |
Kumar, Akshat; Zilberstein, Shlomo Event-Detecting Multi-Agent MDPs: Complexity and Constant-Factor Approximation Conference Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI), Pasadena, California, 2009. @conference{SZ:KZijcai09, Planning under uncertainty for multiple agents has grown rapidly with the development of formal models such as multi-agent MDPs and decentralized MDPs. But despite their richness, the applicability of these models remains limited due to their computational complexity. We present the class of event-detecting multi-agent MDPs (eMMDPs), designed to detect multiple mobile targets by a team of sensor agents. We show that eMMDPs are NP-Hard and present a scalable 2-approximation algorithm for solving them using matroid theory and constraint optimization. The complexity of the algorithm is linear in the state-space and number of agents, quadratic in the horizon, and exponential only in a small parameter that depends on the interaction among the agents. Despite the worst-case approximation ratio of 2, experimental results show that the algorithm produces near-optimal policies for a range of test problems. |
Amato, Christopher; Dibangoye, Jilles Steeve; Zilberstein, Shlomo Incremental Policy Generation for Finite-Horizon DEC-POMDPs Conference Proceedings of the 19th International Conference on Automated Planning and Scheduling (ICAPS), Thessaloniki, Greece, 2009. @conference{SZ:ADZicaps09, Decentralized partially observable MDPs (DEC-POMDPs) provide a rich framework for modeling decision making by a team of agents. Despite rapid progress in this area, the limited scalability of solution techniques has restricted the applicability of the model. To overcome this computational barrier, research has focused on restricted classes of DEC-POMDPs, which are easier to solve yet rich enough to capture many practical problems. We present CBDP, an efficient and scalable point-based dynamic programming algorithm for one such model called ND-POMDP (Network Distributed POMDP). Specifically, CBDP provides magnitudes of speedup in the policy computation and generates better quality solution for all test instances. It has linear complexity in the number of agents and horizon length. Furthermore, the complexity per horizon for the examined class of problems is exponential only in a small parameter that depends upon the interaction among the agents, achieving significant scalability for large, loosely coupled multi-agent systems. The efficiency of CBDP lies in exploiting the structure of interactions using constraint networks. These results extend significantly the effectiveness of decision-theoretic planning in multi-agent settings. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Multi-Agent Online Planning with Communication Conference Proceedings of the 19th International Conference on Automated Planning and Scheduling (ICAPS), Thessaloniki, Greece, 2009. @conference{SZ:WZCicaps09, We propose an online algorithm for planning under uncertainty in multi-agent settings modeled as DEC-POMDPs. The algorithm helps overcome the high computational complexity of solving such problems off-line. The key challenge is to produce coordinated behavior using little or no communication. When communication is allowed but constrained, the challenge is to produce high value with minimal communication. The algorithm addresses these challenges by communicating only when history inconsistency is detected, allowing communication to be postponed if necessary. Moreover, it bounds the memory usage at each step and can be applied to problems with arbitrary horizons. The experimental results confirm that the algorithm can solve problems that are too large for the best existing off-line planning algorithms and it outperforms the best online method, producing higher value with much less communication in most cases. |
Allen, Martin; Zilberstein, Shlomo Complexity of Decentralized Control: Special Cases Conference Proceedings of the 23rd Neural Information Processing Systems Conference (NIPS), Vancouver, British Columbia, Canada, 2009. @conference{SZ:AZnips09, The worst-case complexity of general decentralized POMDPs, which are equivalent to partially observable stochastic games (POSGs) is very high, both for the cooperative and competitive cases. Some reductions in complexity have been achieved by exploiting independence relations in some models. We show that these results are somewhat limited: when these independence assumptions are relaxed in very small ways, complexity returns to that of the general case. |
Goldman, Claudia V; Zilberstein, Shlomo Communication-Based Decomposition Mechanisms for Decentralized MDPs Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 32, pp. 169–202, 2008. @article{SZ:GZjair08, Multi-agent planning in stochastic environments can be framed formally as a decentralized Markov decision problem. Many real-life distributed problems that arise in manufacturing, multi-robot coordination and information gathering scenarios can be formalized using this framework. However, finding the optimal solution in the general case is hard, limiting the applicability of recently developed algorithms. This paper provides a practical approach for solving decentralized control problems when communication among the decision makers is possible, but costly. We develop the notion of communication-based mechanism that allows us to decompose a decentralized MDP into multiple single-agent problems. In this framework, referred to as decentralized semi-Markov decision process with direct communication (Dec-SMDP-Com), agents operate separately between communications. We show that finding an optimal mechanism is equivalent to solving optimally a Dec-SMDP-Com. We also provide a heuristic search algorithm that converges on the optimal decomposition. Restricting the decomposition to some specific types of local behaviors reduces significantly the complexity of planning. In particular, we present a polynomial-time algorithm for the case in which individual agents perform goal-oriented behaviors between communications. The paper concludes with an additional tractable algorithm that enables the introduction of human knowledge, thereby reducing the overall problem to finding the best time to communicate. Empirical results show that these approaches provide good approximate solutions. |
Seuken, Sven; Zilberstein, Shlomo Formal Models and Algorithms for Decentralized Decision Making under Uncertainty Journal Article In: Autonomous Agents and Multi-Agent Systems (JAAMAS), vol. 17, no. 2, pp. 190–250, 2008. @article{SZ:SZjaamas08, Multi-agent planning in stochastic environments can be framed formally as a decentralized Markov decision problem. Many real-life distributed problems that arise in manufacturing, multi-robot coordination and information gathering scenarios can be formalized using this framework. However, finding the optimal solution in the general case is hard, limiting the applicability of recently developed algorithms. This paper provides a practical approach for solving decentralized control problems when communication among the decision makers is possible, but costly. We develop the notion of communication-based mechanism that allows us to decompose a decentralized MDP into multiple single-agent problems. In this framework, referred to as decentralized semi-Markov decision process with direct communication (Dec-SMDP-Com), agents operate separately between communications. We show that finding an optimal mechanism is equivalent to solving optimally a Dec-SMDP-Com. We also provide a heuristic search algorithm that converges on the optimal decomposition. Restricting the decomposition to some specific types of local behaviors reduces significantly the complexity of planning. In particular, we present a polynomial-time algorithm for the case in which individual agents perform goal-oriented behaviors between communications. The paper concludes with an additional tractable algorithm that enables the introduction of human knowledge, thereby reducing the overall problem to finding the best time to communicate. Empirical results show that these approaches provide good approximate solutions. |
Petrik, Marek; Zilberstein, Shlomo A Successive Approximation Algorithm for Coordination Problems Conference Proceedings of the 10th International Symposium on Artificial Intelligence and Mathematics (ISAIM), Ft. Lauderdale, Florida, 2008. @conference{SZ:PZisaim08, Developing scalable coordination algorithms for multi-agent systems is a hard computational challenge. One useful approach, demonstrated by the Coverage Set Algorithm (CSA), exploits structured interaction to produce significant computational gains. Empirically, CSA exhibits very good anytime performance, but an error bound on the results has not been established. We reformulate the algorithm and derive an online error bound for approximate solutions. Moreover, we propose an effective way to automatically reduce the complexity of the interaction. Our experiments show that this is a promising approach to solve a broad class of decentralized decision problems. The general formulation used by the algorithm makes it both easy to implement and widely applicable to a variety of other AI problems. |
Carlin, Alan; Zilberstein, Shlomo Value-Based Observation Compression for DEC-POMDPs Conference Proceedings of the 7th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Estoril, Portugal, 2008. @conference{SZ:CZaamas08, Representing agent policies compactly is essential for improving the scalability of multi-agent planning algorithms. In this paper, we focus on developing a pruning technique that allows us to merge certain observations within agent policies, while minimizing loss of value. This is particularly important for solving finite-horizon decentralized POMDPs, where agent policies are represented as trees, and where the size of policy trees grows exponentially with the number of observations. We introduce a value-based observation compression technique that prunes the least valuable observations while maintaining an error bound on the value lost as a result of pruning. We analyze the characteristics of this pruning strategy and show empirically that it is effective. Thus, we use compact policies to obtain significantly higher values compared with the best existing DEC-POMDP algorithm. |
Carlin, Alan; Zilberstein, Shlomo Observation Compression in DEC-POMDP Policy Trees Conference AAMAS Workshop on Multi-Agent Sequential Decision Making in Uncertain Domains (MSDM), Estoril, Portugal, 2008, (Best Paper Award). @conference{SZ:CZmsdm08, Representing agent policies compactly is essential for improving the scalability of multi-agent planning algorithms. In this paper, we focus on developing a pruning technique that allows us to merge certain observations from agent policies, while minimizing the loss of value. This is particularly important for solving finite-horizon decentralized POMDPs, where agent policies are represented as trees, and where the size of policy trees grows exponentially with the number of observations. We introduce a value-based observation compression technique that prunes the least valuable observations while maintaining an error bound on the value lost as a result of pruning. We analyze the characteristics of this pruning strategy and show empirically that it is effective. Thus, we use compact policies to obtain significantly higher values compared with the best existing DEC-POMDP algorithm. |
Amato, Christopher; Zilberstein, Shlomo What's Worth Memorizing: Attribute-based Planning for DEC-POMDPs Conference ICAPS Workshop on Multiagent Planning, Sydney, Australia, 2008. @conference{SZ:AZmasplan08, Current algorithms for decentralized partially observable Markov decision processes (DEC-POMDPs) require a large amount of memory to produce high quality plans. To combat this, existing methods optimize a set of finite-state controllers with an arbitrary amount of fixed memory. While this works well for some problems, in general, scalability and solution quality remain limited. As an alternative, we propose remembering some attributes that summarize key aspects of an agent's action and observation history. These attributes are often simple to determine, provide a well-motivated choice of controller size and focus the solution search on important components of agent histories. We show that for a range of DEC-POMDPs such attribute-based representation improves plan quality and scalability. |
Goldman, Claudia V; Allen, Martin; Zilberstein, Shlomo Learning to Communicate in a Decentralized Environment Journal Article In: Autonomous Agents and Multi-Agent Systems (JAAMAS), vol. 15, no. 1, pp. 47–90, 2007. @article{SZ:GAZjaamas07, Learning to communicate is an emerging challenge in AI research. It is known that agents interacting in decentralized, stochastic environments can benefit from exchanging information. Multi-agent planning generally assumes that agents share a common means of communication; however, in building robust distributed systems it is important to address potential miscoordination resulting from misinterpretation of messages exchanged. This paper lays foundations for studying this problem, examining its properties analytically and empirically in a decision-theoretic context. We establish a formal framework for the problem, and identify a collection of necessary and sufficient properties for decision problems that allow agents to employ probabilistic updating schemes in order to learn how to interpret what others are communicating. Solving the problem optimally is often intractable, but our approach enables agents using different languages to converge upon coordination over time. Our experimental work establishes how these methods perform when applied to problems of varying complexity. |
Seuken, Sven; Zilberstein, Shlomo Memory-Bounded Dynamic Programming for DEC-POMDPs Conference Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI), Hyderabad, India, 2007. @conference{SZ:SZijcai07, Decentralized decision making under uncertainty has been shown to be intractable when each agent has different partial information about the domain. Thus, improving the applicability and scalability of planning algorithms is an important challenge. We present the first memory-bounded dynamic programming algorithm for finite-horizon decentralized POMDPs. A set of heuristics is used to identify relevant points of the infinitely large belief space. Using these belief points, the algorithm successively selects the best joint policies for each horizon. The algorithm is extremely efficient, having linear time and space complexity with respect to the horizon length. Experimental results show that it can handle horizons that are multiple orders of magnitude larger than what was previously possible, while achieving the same or better solution quality. These results significantly increase the applicability of decentralized decision-making techniques. |
Amato, Christopher; Bernstein, Daniel S; Zilberstein, Shlomo Optimizing Memory-Bounded Controllers for Decentralized POMDPs Conference Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (UAI), Vancouver, British Columbia, 2007. @conference{SZ:ABZuai07, We present a memory-bounded optimization approach for solving infinite-horizon decentralized POMDPs. Policies for each agent are represented by stochastic finite state controllers. We formulate the problem of optimizing these policies as a nonlinear program, leveraging powerful existing nonlinear optimization techniques for solving the problem. While existing solvers only guarantee locally optimal solutions, we show that our formulation produces higher quality controllers than the state-of-the-art approach. We also incorporate a shared source of randomness in the form of a correlation device to further increase solution quality with only a limited increase in space and time. Our experimental results show that nonlinear optimization can be used to provide high quality, concise solutions to decentralized decision problems under uncertainty. |
Seuken, Sven; Zilberstein, Shlomo Improved Memory-Bounded Dynamic Programming for Decentralized POMDPs Conference Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (UAI), Vancouver, British Columbia, 2007. @conference{SZ:SZuai07, Memory-Bounded Dynamic Programming (MBDP) has proved extremely effective in solving decentralized POMDPs with large horizons. We generalize the algorithm and improve its scalability by reducing the complexity with respect to the number of observations from exponential to polynomial. We derive error bounds on solution quality with respect to this new approximation and analyze the convergence behavior. To evaluate the effectiveness of the improvements, we introduce a new, larger benchmark problem. Experimental results show that despite the high complexity of decentralized POMDPs, scalable solution techniques such as MBDP perform surprisingly well. |
Allen, Martin; Zilberstein, Shlomo Agent Influence as a Predictor of Difficulty for Decentralized Problem-Solving Conference Proceedings of the 22nd Conference on Artificial Intelligence (AAAI), Vancouver, British Columbia, 2007. @conference{SZ:AZaaai07, We study the effect of problem structure on the practical performance of optimal dynamic programming for decentralized decision problems. It is shown that restricting agent influence over problem dynamics can make the problem easier to solve. Experimental results establish that agent influence correlates with problem difficulty: as the gap between the influence of different agents grows, problems tend to become much easier to solve. The measure thus provides a general-purpose, automatic characterization of decentralized problems, identifying those for which optimal methods are more or less likely to work. Such a measure is also of possible use as a heuristic in the design of algorithms that create task decompositions and control hierarchies in order to simplify multiagent problems. |
Petrik, Marek; Zilberstein, Shlomo Anytime Coordination Using Separable Bilinear Programs Conference Proceedings of the 22nd Conference on Artificial Intelligence (AAAI), Vancouver, British Columbia, 2007. @conference{SZ:PZaaai07, Developing scalable coordination algorithms for multi-agent systems is a hard computational challenge. One useful approach, demonstrated by the Coverage Set Algorithm (CSA), exploits structured interaction to produce significant computational gains. Empirically, CSA exhibits very good anytime performance, but an error bound on the results has not been established. We reformulate the algorithm and derive both online and offline error bounds for approximate solutions. Moreover, we propose an effective way to automatically reduce the complexity of the interaction. Our experiments show that this is a promising approach to solve a broad class of decentralized decision problems. The general formulation used by the algorithm makes it both easy to implement and widely applicable to a variety of other AI problems. |
Szer, Daniel; Charpillet, Francois; Zilberstein, Shlomo MAA*: A Heuristic Search Algorithm for Solving Decentralized POMDPs Conference Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence (UAI), Edinburgh, Scotland, 2005. @conference{SZ:SCZuai05, We present multi-agent A* (MAA*), the first complete and optimal heuristic search algorithm for solving decentralized partially-observable Markov decision problems (DEC- POMDPs) with finite horizon. The algorithm is suitable for computing optimal plans for a cooperative group of agents that operate in a stochastic environment such as multi-robot coordination, network traffic control, or distributed resource allocation. Solving such problems effectively is a major challenge in the area of planning under uncertainty. Our solution is based on a synthesis of classical heuristic search and decentralized control theory. Experimental results show that MAA* has significant advantages. We introduce an anytime variant of MAA* and conclude with a discussion of promising extensions such as an approach to solving infinite-horizon problems. |
Allen, Martin; Goldman, Claudia V; Zilberstein, Shlomo Language Learning in Multi-Agent Systems Conference Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI), Edinburgh, Scotland, 2005. @conference{SZ:AGZijcai05, We present the problem of learning to communicate in decentralized and stochastic environments, analyzing it formally in a decision-theoretic context and illustrating the concept experimentally. Our approach allows agents to converge upon coordinated communication and action over time. |
Bernstein, Daniel S; Hansen, Eric A; Zilberstein, Shlomo Bounded Policy Iteration for Decentralized POMDPs Conference Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI), Edinburgh, Scotland, 2005. @conference{SZ:BHZijcai05, We present a bounded policy iteration algorithm for infinite-horizon decentralized POMDPs. Policies are represented as joint stochastic finite-state controllers, which consist of a local controller for each agent. We also let a joint controller include a correlation device that allows the agents to correlate their behavior without exchanging information during execution, and show that this leads to improved performance. The algorithm uses a fixed amount of memory, and each iteration is guaranteed to produce a controller with value at least as high as the previous one for all possible initial state distributions. For the case of a single agent, the algorithm reduces to Poupart and Boutilier's bounded policy iteration for POMDPs. |
Stochastic Network Design and Optimization
Mahmud, Saaduddin; Nashed, Samer B.; Goldman, Claudia V.; Zilberstein, Shlomo Estimating Causal Responsibility for Explaining Autonomous Behavior Book Section In: Calvaresi, Davide (Ed.): International Workshop on Explainable and Transparent AI and Multi-Agent Systems (EXTRAAMAS), pp. 78–94, Springer, 2023. @incollection{SZ:MNGZextraamas23, There has been growing interest in causal explanations of stochastic, sequential decision-making systems. Structural causal models and causal reasoning offer several theoretical benefits when exact inference can be applied. Furthermore, users overwhelmingly prefer the resulting causal explanations over other state-of-the-art systems. In this work, we focus on one such method, MeanRESP, and its approximate versions that drastically reduce compute load and assign a responsibility score to each variable, which helps identify smaller sets of causes to be used as explanations. However, this method, and its approximate versions in particular, lack deeper theoretical analysis and broader empirical tests. To address these shortcomings, we provide three primary contributions. First, we offer several theoretical insights on the sample complexity and error rate of approximate MeanRESP. Second, we discuss several automated metrics for comparing explanations generated from approximate methods to those generated via exact methods. While we recognize the significance of user studies as the gold standard for evaluating explanations, our aim is to leverage the proposed metrics to systematically compare explanation-generation methods along important quantitative dimensions. Finally, we provide a more detailed discussion of MeanRESP and how its output under different definitions of responsibility compares to existing widely adopted methods that use Shapley values. |
Parr, Shane; Khatri, Ishan; Svegliato, Justin; Zilberstein, Shlomo Agent-Aware State Estimation in Autonomous Vehicles Conference Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 2021. @conference{SZ:PKSZiros21, Autonomous systems often operate in environments where the behavior of multiple agents is coordinated by a shared global state. Reliable estimation of the global state is thus critical for successfully operating in a multi-agent setting. We introduce agent-aware state estimation--a framework for calculating indirect estimations of state given observations of the behavior of other agents in the environment. We also introduce transition-independent agent-aware state estimation--a tractable class of agent-aware state estimation--and show that it allows the speed of inference to scale linearly with the number of agents in the environment. As an example, we model traffic light classification in instances of complete loss of direct observation. By taking into account observations of vehicular behavior from multiple directions of traffic, our approach exhibits accuracy higher than that of existing traffic light-only HMM methods on a real-world autonomous vehicle data set under a variety of simulated occlusion scenarios. |
Wu, Feng; Zilberstein, Shlomo; Jennings, Nicholas R Multi-Agent Planning with High-Level Human Guidance Conference Proceedings of Principles and Practice of Multi-Agent Systems (PRIMA), 2020. @conference{SZ:WZJprima20, Planning and coordination of multiple agents in the presence of uncertainty and noisy sensors is extremely hard. A human operator who observes a multi-agent team can provide valuable guidance to the team based on her superior ability to interpret observations and assess the overall situation. We propose an extension of decentralized POMDPs that allows such human guidance to be factored into the planning and execution processes. Human guidance in our framework consists of intuitive high-level commands that the agents must translate into a suitable joint plan that is sensitive to what they know from local observations. The result is a framework that allows multi-agent systems to benefit from the complex strategic thinking of a human supervising them. We evaluate this approach on several common benchmark problems and show that it can lead to dramatic improvement in performance. |
Wu, Feng; Zilberstein, Shlomo; Jennings, Nicholas R Stochastic Multi-agent Planning with Partial State Models Conference Proceedings of the First International Conference on Distributed Artificial Intelligence (DAI), Beijing, China, 2019. @conference{SZ:WZJdai19, People who observe a multi-agent team can often provide valuable information to the agents based on their superior cognitive abilities to interpret sequences of observations and assess the overall situation. The knowledge they possess is often difficult to be fully represented using a formal model such as DEC-POMDP. To deal with this, we propose an extension of the DEC-POMDP that allows states to be partially specified and benefit from expert knowledge, while preserving the partial observability and decentralized operation of the agents. In particular, we present an algorithm for computing policies based on history samples that include human labeled data in the form of reward reshaping. We also consider ways to minimize the burden on human experts during the labeling phase. The results offer the first approach to incorporating human knowledge in such complex multi-agent settings. We demonstrate the benefits of our approach using a disaster recovery scenario, comparing it to several baseline approaches. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Privacy-Preserving Policy Iteration for Decentralized POMDPs Conference Proceedings of the 32nd Conference on Artificial Intelligence (AAAI), New Orleans, Louisiana, 2018. @conference{SZ:WZCaaai18, We propose the first privacy-preserving approach to address the privacy issues that arise in multi-agent planning problems modeled as a Dec-POMDP. Our solution is a distributed message-passing algorithm based on trials, where the agents' policies are optimized using the cross-entropy method. In our algorithm, the agents' private information is protected using a public-key homomorphic cryptosystem. We prove the correctness of our algorithm and analyze its complexity in terms of message passing and encryption/decryption operations. Furthermore, we analyze several privacy aspects of our algorithm and show that it can preserve the agent privacy of non-neighbors, model privacy, and decision privacy. Our experimental results on several common Dec-POMDP bench- mark problems confirm the effectiveness of our approach. |
Wray, Kyle Hollins; Kumar, Akshat; Zilberstein, Shlomo Integrated Cooperation and Competition in Multi-Agent Decision-Making Conference Proceedings of the 32nd Conference on Artificial Intelligence (AAAI), New Orleans, Louisiana, 2018. @conference{SZ:WKZaaai18, Observing that many real-world sequential decision problems are not purely cooperative or purely competitive, we propose a new model--cooperative-competitive process (CCP)--that can simultaneously encapsulate both cooperation and competition. First, we discuss how the CCP model bridges the gap between cooperative and competitive models. Next, we investigate a specific class of group-dominant CCPs, in which agents cooperate to achieve a common goal as their primary objective, while also pursuing individual goals as a secondary objective. We provide an approximate solution for this class of problems that leverages stochastic finite-state controllers. The model is grounded in two multi-robot meeting and box-pushing domains that are implemented in simulation and demonstrated on two real robots. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Multi-Agent Planning with Baseline Regret Minimization Conference Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI), 2017. @conference{SZ:WZCijcai17, We propose a novel baseline regret minimization algorithm for multi-agent planning problems modeled as finite-horizon decentralized POMDPs. It guarantees to produce a policy that is provably at least as good as a given baseline policy. We also propose an iterative belief generation algorithm to efficiently minimize the baseline regret, which only requires necessary iterations so as to converge to the policy with minimum baseline regret. Experimental results on common benchmark problems confirm the benefits of the algorithm compared with the state-of-the-art approaches. |
Kumar, Akshat; Mostafa, Hala; Zilberstein, Shlomo Dual Formulations for Optimizing Dec-POMDP Controllers Conference Proceedings of the 26th International Conference on Automated Planning and Scheduling (ICAPS), London, UK, 2016. @conference{SZ:KMZicaps16, Decentralized POMDP is an expressive model for multiagent planning. Finite-state controllers (FSCs)--often used to represent policies for infinite-horizon problems---offer a compact, simple-to-execute policy representation. We exploit novel connections between optimizing decentralized FSCs and the dual linear program for MDPs. Consequently, we describe a dual mixed integer linear program (MIP) for optimizing deterministic FSCs. We exploit the Dec-POMDP structure to devise a compact MIP and formulate constraints that result in policies executable in partially-observable decentralized settings. We show analytically that the dual formulation can also be exploited within the expectation maximization (EM) framework to optimize stochastic FSCs. The resulting EM algorithm can be implemented by solving a sequence of linear programs, without requiring expensive message passing over the Dec-POMDP DBN. We also present an efficient technique for policy improvement based on a weighted entropy measure. Compared with state-of-the-art FSC methods, our approach offers over an order-of-magnitude speedup, while producing similar or better solutions. |
Kumar, Akshat; Zilberstein, Shlomo; Toussaint, Marc Probabilistic Inference Techniques for Scalable Multiagent Decision Making Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 53, pp. 223–270, 2015. @article{SZ:KZTjair15, Decentralized POMDPs provide an expressive framework for multiagent sequential decision making. However, the complexity of these models -- NEXP-Complete even for two agents -- has limited their scalability. We present a promising new class of approximation algorithms by developing novel connections between multiagent planning and machine learning. We show how the multiagent planning problem can be reformulated as inference in a mixture of dynamic Bayesian networks (DBNs). This planning-as-inference approach paves the way for the application of efficient inference techniques in DBNs to multiagent decision making. To further improve scalability, we identify certain conditions that are sufficient to extend the approach to multiagent systems with dozens of agents. Specifically, we show that the necessary inference within the expectation-maximization framework can be decomposed into processes that often involve a small subset of agents, thereby facilitating scalability. We further show that a number of existing multiagent planning models satisfy these conditions. Experiments on large planning benchmarks confirm the benefits of our approach in terms of runtime and scalability with respect to existing techniques. |
Nguyen, Duc Thien; Yeoh, William; Lau, Hoong Chuin; Zilberstein, Shlomo; Zhang, Chongjie Decentralized Multi-Agent Reinforcement Learning in Average-Reward Dynamic DCOPs Conference Proceedings of the 28th Conference on Artificial Intelligence (AAAI), Quebec City, Canada, 2014. @conference{SZ:NYLZZaaai14, Researchers have introduced the Dynamic Distributed Constraint Optimization Problem (Dynamic DCOP) formulation to model dynamically changing multi-agent coordination problems, where a dynamic DCOP is a sequence of (static canonical) DCOPs, each partially different from the DCOP preceding it. Existing work typically assumes that the problem in each time step is decoupled from the problems in other time steps, which might not hold in some applications. Therefore, in this paper, we make the following contributions: (i) We introduce a new model, called Markovian Dynamic DCOPs (MD-DCOPs), where the DCOP in the next time step is a function of the value assignments in the current time step; (ii) We introduce two distributed reinforcement learning algorithms, the Distributed RVI Q-learning algorithm and the Distributed R-learning algorithm, that balance exploration and exploitation to solve MD-DCOPs in an online manner; and (iii) We empirically evaluate them against an existing multiarm bandit DCOP algorithm on dynamic DCOPs. |
Brafman, Ronen I; Shani, Guy; Zilberstein, Shlomo Qualitative Planning under Partial Observability in Multi-Agent Domains Conference Proceedings of the 27th Conference on Artificial Intelligence (AAAI), Bellevue, Washington, 2013. @conference{SZ:BSZaaai13, Decentralized POMDPs (Dec-POMDPs) provide a rich, attractive model for planning under uncertainty and partial observability in cooperative multi-agent domains with a growing body of research. In this paper we formulate a qualitative, propositional model for multi-agent planning under uncertainty with partial observability, which we call Qualitative Dec-POMDP (QDec-POMDP). We show that the worst-case complexity of planning in QDec-POMDPs is similar to that of Dec-POMDPs. Still, because the model is more "classical" in nature, it is more compact and easier to specify. Furthermore, it eases the adaptation of methods used in classical and contingent planning to solve problems that challenge current Dec-POMDPs solvers. In particular, in this paper we describe a method based on compilation to classical planning, which handles multi-agent planning problems significantly larger than those handled by current Dec-POMDP algorithms. |
Wu, Feng; Zilberstein, Shlomo; Jennings, Nicholas R Monte-Carlo Expectation Maximization for Decentralized POMDPs Conference Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI), Beijing, China, 2013. @conference{SZ:WZJijcai13, We address two significant drawbacks of state-of-the-art solvers of decentralized POMDPs (DEC-POMDPs): the reliance on complete knowledge of the model and limited scalability as the complexity of the domain grows. We extend a recently proposed approach for solving DEC-POMDPs via a reduction to the maximum likelihood problem, which in turn can be solved using EM. We introduce a model-free version of this approach that employs Monte-Carlo EM (MCEM). While a naive implementation of MCEM is inadequate in multi-agent settings, we introduce several improvements in sampling that produce high-quality results on a variety of DEC-POMDP benchmarks, including large problems with thousands of agents. |
Yeoh, William; Kumar, Akshat; Zilberstein, Shlomo Automated Generation of Interaction Graphs for Value-Factored Dec-POMDPs Conference Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI), Beijing, China, 2013. @conference{SZ:YKZijcai13, The Decentralized Partially Observable Markov Decision Process (Dec-POMDP) is a powerful model for multiagent planning under uncertainty, but its applicability is hindered by its high complexity -- solving Dec-POMDPs optimally is NEXP-hard. Recently, Kumar et al. introduced the Value Factorization (VF) framework, which exploits decomposable value functions that can be factored into subfunctions. This framework has been shown to be a generalization of several models that leverage sparse agent interactions such as TI-Dec-MDPs, ND-POMDPs and TD-POMDPs. Existing algorithms for these models assume that the interaction graph of the problem is given. In this paper, we introduce three algorithms to automatically generate interaction graphs for models within the VF framework and establish lower and upper bounds on the expected reward of an optimal joint policy. We illustrate experimentally the benefits of these techniques for sensor placement in a decentralized tracking application. |
Durfee, Edmund; Zilberstein, Shlomo Multiagent Planning, Control, and Execution Book Section In: Weiss, G (Ed.): Multiagent Systems, Second Edition, pp. 485–546, MIT Press, Cambridge, MA, USA, 2013. @incollection{SZ:DZmultiagent13, |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Online Planning for Multi-Agent Systems with Bounded Communication Journal Article In: Artificial Intelligence (AIJ), vol. 175, no. 2, pp. 487–511, 2011. @article{SZ:WZCaij11, We propose an online algorithm for planning under uncertainty in multi-agent settings modeled as DEC-POMDPs. The algorithm helps overcome the high computational complexity of solving such problems offline. The key challenges in decentralized operation are to maintain coordinated behavior with little or no communication and, when communication is allowed, to optimize value with minimal communication. The algorithm addresses these challenges by generating identical conditional plans based on common knowledge and communicating only when history inconsistency is detected, allowing communication to be postponed when necessary. To be suitable for online operation, the algorithm computes good local policies using a new and fast local search method implemented using linear programming. Moreover, it bounds the amount of memory used at each step and can be applied to problems with arbitrary horizons. The experimental results confirm that the algorithm can solve problems that are too large for the best existing offline planning algorithms and it outperforms the best online method, producing much higher value with much less communication in most cases. The algorithm also proves to be effective when the communication channel is imperfect (periodically unavailable). These results contribute to the scalability of decision-theoretic planning in multi-agent settings. |
Kumar, Akshat; Zilberstein, Shlomo Message-Passing Algorithms for Large Structured Decentralized POMDPs Conference Proceedings of the 10th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Taipei, Taiwan, 2011. @conference{SZ:KZaamas11, Anytime algorithms allow a system to trade solution quality for computation time. In previous work, monitoring techniques have been developed to allow agents to stop the computation at the "right" time so as to optimize a given time-dependent utility function. However, these results apply only to the single-agent case. In this paper we analyze the problems that arise when several agents solve components of a larger problem, each using an anytime algorithm. Monitoring in this case is more challenging as each agent is uncertain about the progress made so far by the others. We develop a formal framework for decentralized monitoring, establish the complexity of several interesting variants of the problem, and propose solution techniques for each one. Finally, we show that the framework can be applied to decentralized flow and planning problems. |
Kumar, Akshat; Zilberstein, Shlomo; Toussaint, Marc Scalable Multiagent Planning Using Probabilistic Inference Conference Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI), Barcelona, Spain, 2011. @conference{SZ:KZTijcai11, Multiagent planning has seen much progress with the development of formal models such as Dec-POMDPs. However, the complexity of these models -- NEXP-Complete even for two agents -- has limited scalability. We identify certain mild conditions that are sufficient to make multiagent planning amenable to a scalable approximation w.r.t. the number of agents. This is achieved by constructing a graphical model in which likelihood maximization is equivalent to plan optimization. Using the Expectation-Maximization framework for likelihood maximization, we show that the necessary inference can be decomposed into processes that often involve a small subset of agents, thereby facilitating scalability. We derive a global update rule that combines these local inferences to monotonically increase the overall solution quality. Experiments on a large multiagent planning benchmark confirm the benefits of the new approach in terms of runtime and scalability. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Online Planning for Ad Hoc Autonomous Agent Teams Conference Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI), Barcelona, Spain, 2011. @conference{SZ:WZCijcai11, We propose a novel online planning algorithm for ad hoc team settings -- challenging situations in which an agent must collaborate with unknown teammates without prior coordination. Our approach is based on constructing and solving a series of stage games, and then using biased adaptive play to choose actions. The utility function in each stage game is estimated via Monte-Carlo tree search using the UCT algorithm. We establish analytically the convergence of the algorithm and show that it performs well in a variety of ad hoc team domains. |
Amato, Christopher; Bernstein, Daniel S; Zilberstein, Shlomo Optimizing Fixed-Size Stochastic Controllers for POMDPs and Decentralized POMDPs Journal Article In: Autonomous Agents and Multi-Agent Systems (JAAMAS), vol. 21, no. 3, pp. 293–320, 2010. @article{SZ:ABZjaamas10, Coordination of distributed agents is required for problems arising in many areas, including multi-robot systems, networking and e-commerce. As a formal framework for such problems, we use the decentralized partially observable Markov decision process (DEC-POMDP). Though much work has been done on optimal dynamic programming algorithms for the single-agent version of the problem, optimal algorithms for the multiagent case have been elusive. The main contribution of this paper is an optimal policy iteration algorithm for solving DEC-POMDPs. The algorithm uses stochastic finite-state controllers to represent policies. The solution can include a correlation device, which allows agents to correlate their actions without communicating. This approach alternates between expanding the controller and performing value-preserving transformations, which modify the controller without sacrificing value. We present two Efficient value-preserving transformations: one can reduce the size of the controller and the other can improve its value while keeping the size fixed. Empirical results demonstrate the usefulness of value-preserving transformations in increasing value while keeping controller size to a minimum. To broaden the applicability of the approach, we also present a heuristic version of the policy iteration algorithm, which sacrifices convergence to optimality. This algorithm further reduces the size of the controllers at each step by assuming that probability distributions over the other agents' actions are known. While this assumption may not hold in general, it helps produce higher quality solutions in our test problems. |
Kumar, Akshat; Zilberstein, Shlomo Point-Based Backup for Decentralized POMDPs: Complexity and New Algorithms Conference Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Toronto, Canada, 2010. @conference{SZ:KZaamas10, Decentralized POMDPs provide an expressive framework for sequential multi-agent decision making. Despite their high complexity, there has been significant progress in scaling up existing algorithms, largely due to the use of point-based methods. Performing point-based backup is a fundamental operation in state-of-the-art algorithms. We show that even a single backup step in the multi-agent setting is NP-Complete. Despite this negative worst-case result, we present an efficient and scalable optimal algorithm as well as a principled approximation scheme. The optimal algorithm exploits recent advances in the weighted CSP literature to overcome the complexity of the backup operation. The polytime approximation scheme provides a constant factor approximation guarantee based on the number of belief points. In experiments on standard domains, the optimal approach provides significant speedup (up to 2 orders of magnitude) over the previous best optimal algorithm and is able to increase the number of belief points by more than a factor of 3. The approximation scheme also works well in practice, providing near-optimal solutions to the backup problem. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Point-Based Policy Generation for Decentralized POMDPs Conference Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Toronto, Canada, 2010. @conference{SZ:WZCaamas10, Memory-bounded techniques have shown great promise in solving complex multi-agent planning problems modeled as DEC-POMDPs. Much of the performance gains can be attributed to pruning techniques that alleviate the complexity of the exhaustive backup step of the original MBDP algorithm. Despite these improvements, state-of-the-art algorithms can still handle a relative small pool of candidate policies, which limits the quality of the solution in some benchmark problems. We present a new algorithm, Point-Based Policy Generation, which avoids altogether searching the entire joint policy space. The key observation is that the best joint policy for each reachable belief state can be constructed directly, instead of producing first a large set of candidates. We also provide an efficient approximate implementation of this operation. The experimental results show that our solution technique improves the performance significantly in terms of both runtime and solution quality. |
Kumar, Akshat; Zilberstein, Shlomo Anytime Planning for Decentralized POMDPs using Expectation Maximization Conference Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI), Catalina Island, California, 2010. @conference{SZ:KZuai10, Decentralized POMDPs provide an expressive framework for multi-agent sequential decision making. While finite-horizon DEC-POMDPs have enjoyed significant success, progress remains slow for the infinite-horizon case mainly due to the inherent complexity of optimizing stochastic controllers representing agent policies. We present a promising new class of algorithms for the infinite-horizon case, which recasts the optimization problem as inference in a mixture of DBNs. An attractive feature of this approach is the straightforward adoption of existing inference techniques in DBNs for solving DEC-POMDPs and supporting richer representations such as factored or continuous states and actions. We also derive the Expectation Maximization (EM) algorithm to optimize the joint policy represented as DBNs. Experiments on benchmark domains show that EM compares favorably against the state-of-the-art solvers. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Rollout Sampling Policy Iteration for Decentralized POMDPs Conference Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI), Catalina Island, California, 2010. @conference{SZ:WZCuai10, We present decentralized rollout sampling policy iteration (DecRSPI)--a new algorithm for multiagent decision problems formalized as DEC-POMDPs. DecRSPI is designed to improve scalability and tackle problems that lack an explicit model. The algorithm uses Monte-Carlo methods to generate a sample of reachable belief states. Then it computes a joint policy for each belief state based on the rollout estimations. A new policy representation allows us to represent solutions compactly. The key benefits of the algorithm are its linear time complexity over the number of agents, its bounded memory usage and good solution quality. It can solve larger problems that are intractable for existing planning algorithms. Experimental results confirm the effectiveness and scalability of the approach. |
Amato, Christopher; Bonet, Blai; Zilberstein, Shlomo Finite-State Controllers Based on Mealy Machines for Centralized and Decentralized POMDPs Conference Proceedings of the 24th Conference on Artificial Intelligence (AAAI), Atlanta, Georgia, 2010. @conference{SZ:ABZaaai10, Existing controller-based approaches for centralized and decentralized POMDPs are based on automata with output known as Moore machines. In this paper, we show that several advantages can be gained by utilizing another type of automata, the Mealy machine. Mealy machines are more powerful than Moore machines, provide a richer structure that can be exploited by solution methods, and can be easily incorporated into current controller-based approaches. To demonstrate this, we adapted some existing controller-based algorithms to use Mealy machines and obtained results on a set of benchmark domains. The Mealy-based approach always outperformed the Moore-based approach and often outperformed the state-of-the-art algorithms for both centralized and decentralized POMDPs. These findings provide fresh and general insights for the improvement of existing algorithms and the development of new ones. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Trial-Based Dynamic Programming for Multi-Agent Planning Conference Proceedings of the 24th Conference on Artificial Intelligence (AAAI), Atlanta, Georgia, 2010. @conference{SZ:WZCaaai10, Trial-based approaches offer an efficient way to solve single-agent MDPs and POMDPs. These approaches allow agents to focus their computations on regions of the environment they encounter during the trials, leading to significant computational savings. We present a novel trial-based dynamic programming (TBDP) algorithm for DEC-POMDPs that extends these benefits to multi-agent settings. The algorithm uses trial-based methods for both belief generation and policy evaluation. Policy improvement is implemented efficiently using linear programming and a sub-policy reuse technique that helps bound the amount of memory. The results show that TBDP can produce significant value improvements and is much faster than the best existing planning algorithms. |
Bernstein, Daniel S; Amato, Christopher; Hansen, Eric A; Zilberstein, Shlomo Policy Iteration for Decentralized Control of Markov Decision Processes Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 34, pp. 89–132, 2009. @article{SZ:BAHZjair09, Coordination of distributed agents is required for problems arising in many areas, including multi-robot systems, networking and e-commerce. As a formal framework for such problems, we use the decentralized partially observable Markov decision process (DEC-POMDP). Though much work has been done on optimal dynamic programming algorithms for the single-agent version of the problem, optimal algorithms for the multiagent case have been elusive. The main contribution of this paper is an optimal policy iteration algorithm for solving DEC-POMDPs. The algorithm uses stochastic finite-state controllers to represent policies. The solution can include a correlation device, which allows agents to correlate their actions without communicating. This approach alternates between expanding the controller and performing value-preserving transformations, which modify the controller without sacrificing value. We present two Efficient value-preserving transformations: one can reduce the size of the controller and the other can improve its value while keeping the size fixed. Empirical results demonstrate the usefulness of value-preserving transformations in increasing value while keeping controller size to a minimum. To broaden the applicability of the approach, we also present a heuristic version of the policy iteration algorithm, which sacrifices convergence to optimality. This algorithm further reduces the size of the controllers at each step by assuming that probability distributions over the other agents' actions are known. While this assumption may not hold in general, it helps produce higher quality solutions in our test problems. |
Petrik, Marek; Zilberstein, Shlomo A Bilinear Programming Approach for Multiagent Planning Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 35, pp. 235–274, 2009. @article{SZ:PZjair09, Multiagent planning and coordination problems are common and known to be computationally hard. We show that a wide range of two-agent problems can be formulated as bilinear programs. We present a successive approximation algorithm that significantly outperforms the coverage set algorithm, which is the state-of-the-art method for this class of multiagent problems. Because the algorithm is formulated for bilinear programs, it is more general and simpler to implement. The new algorithm can be terminated at any time and--unlike the coverage set algorithm--it facilitates the derivation of a useful online performance bound. It is also much more efficient, on average reducing the computation time of the optimal solution by about four orders of magnitude. Finally, we introduce an automatic dimensionality reduction method that improves the effectiveness of the algorithm, extending its applicability to new domains and providing a new way to analyze a subclass of bilinear programs. |
Becker, Raphen; Carlin, Alan; Lesser, Victor; Zilberstein, Shlomo Analyzing Myopic Approaches for Multi-Agent Communication Journal Article In: Computational Intelligence, vol. 25, no. 1, pp. 31–50, 2009. @article{SZ:BCLZci09, Choosing when to communicate is a fundamental problem in multi-agent systems. This problem becomes particularly challenging when communication is constrained and each agent has different partial information about the overall situation. We take a decision-theoretic approach to this problem that balances the benefits of communication against the costs. Although computing the exact value of communication is intractable, it can be estimated using a standard myopic assumption--that communication is only possible at the present time. We examine specific situations in which this assumption leads to poor performance and demonstrate an alternative approach that relaxes the assumption and improves performance. The results provide an effective method for value-driven communication policies in multi-agent systems. |
Amato, Christopher; Zilberstein, Shlomo Achieving Goals in Decentralized POMDPs Conference Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Budapest, Hungary, 2009. @conference{ASZ:Zaamas09, Coordination of multiple agents under uncertainty in the decentralized POMDP model is known to be NEXP-complete, even when the agents have a joint set of goals. Nevertheless, we show that the existence of goals can help develop effective planning algorithms. We examine an approach to model these problems as indefinite-horizon decentralized POMDPs, suitable for many practical problems that terminate after some unspecified number of steps. Our algorithm for solving these problems is optimal under some common assumptions--that terminal actions exist for each agent and rewards for non-terminal actions are negative. We also propose an infinite-horizon approximation method that allows us to relax these assumptions while maintaining goal conditions. An optimality bound is developed for this sample-based approach and experimental results show that it is able to exploit the goal structure effectively. Compared with the state-of-the-art, our approach can solve larger problems and produce significantly better solutions. |
Kumar, Akshat; Zilberstein, Shlomo Constraint-Based Dynamic Programming for Decentralized POMDPs with Structured Interactions Conference Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Budapest, Hungary, 2009. @conference{SZ:KZaamas09, Decentralized partially observable MDPs (DEC-POMDPs) provide a rich framework for modeling decision making by a team of agents. Despite rapid progress in this area, the limited scalability of solution techniques has restricted the applicability of the model. To overcome this computational barrier, research has focused on restricted classes of DEC-POMDPs, which are easier to solve yet rich enough to capture many practical problems. We present CBDP, an efficient and scalable point-based dynamic programming algorithm for one such model called ND-POMDP (Network Distributed POMDP). Specifically, CBDP provides magnitudes of speedup in the policy computation and generates better quality solution for all test instances. It has linear complexity in the number of agents and horizon length. Furthermore, the complexity per horizon for the examined class of problems is exponential only in a small parameter that depends upon the interaction among the agents, achieving significant scalability for large, loosely coupled multi-agent systems. The efficiency of CBDP lies in exploiting the structure of interactions using constraint networks. These results extend significantly the effectiveness of decision-theoretic planning in multi-agent settings. |
Kumar, Akshat; Zilberstein, Shlomo Dynamic Programming Approximations for Partially Observable Stochastic Games Conference Proceedings of the 22nd International FLAIRS Conference, Sanibel Island, Florida, 2009. @conference{SZ:KZflairs09, Partially observable stochastic games (POSGs) provide a rich mathematical framework for planning under uncertainty by a group of agents. However, this modeling advantage comes with a price, namely a high computational cost. Solving POSGs optimally quickly becomes intractable after a few decision cycles. Our main contribution is to provide bounded approximation techniques, which enable us to scale POSG algorithms by several orders of magnitude. We study both the POSG model and its cooperative counterpart, DEC-POMDP. Experiments on a number of problems confirm the scalability of our approach while still providing useful policies. |
Kumar, Akshat; Zilberstein, Shlomo Event-Detecting Multi-Agent MDPs: Complexity and Constant-Factor Approximation Conference Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI), Pasadena, California, 2009. @conference{SZ:KZijcai09, Planning under uncertainty for multiple agents has grown rapidly with the development of formal models such as multi-agent MDPs and decentralized MDPs. But despite their richness, the applicability of these models remains limited due to their computational complexity. We present the class of event-detecting multi-agent MDPs (eMMDPs), designed to detect multiple mobile targets by a team of sensor agents. We show that eMMDPs are NP-Hard and present a scalable 2-approximation algorithm for solving them using matroid theory and constraint optimization. The complexity of the algorithm is linear in the state-space and number of agents, quadratic in the horizon, and exponential only in a small parameter that depends on the interaction among the agents. Despite the worst-case approximation ratio of 2, experimental results show that the algorithm produces near-optimal policies for a range of test problems. |
Amato, Christopher; Dibangoye, Jilles Steeve; Zilberstein, Shlomo Incremental Policy Generation for Finite-Horizon DEC-POMDPs Conference Proceedings of the 19th International Conference on Automated Planning and Scheduling (ICAPS), Thessaloniki, Greece, 2009. @conference{SZ:ADZicaps09, Decentralized partially observable MDPs (DEC-POMDPs) provide a rich framework for modeling decision making by a team of agents. Despite rapid progress in this area, the limited scalability of solution techniques has restricted the applicability of the model. To overcome this computational barrier, research has focused on restricted classes of DEC-POMDPs, which are easier to solve yet rich enough to capture many practical problems. We present CBDP, an efficient and scalable point-based dynamic programming algorithm for one such model called ND-POMDP (Network Distributed POMDP). Specifically, CBDP provides magnitudes of speedup in the policy computation and generates better quality solution for all test instances. It has linear complexity in the number of agents and horizon length. Furthermore, the complexity per horizon for the examined class of problems is exponential only in a small parameter that depends upon the interaction among the agents, achieving significant scalability for large, loosely coupled multi-agent systems. The efficiency of CBDP lies in exploiting the structure of interactions using constraint networks. These results extend significantly the effectiveness of decision-theoretic planning in multi-agent settings. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Multi-Agent Online Planning with Communication Conference Proceedings of the 19th International Conference on Automated Planning and Scheduling (ICAPS), Thessaloniki, Greece, 2009. @conference{SZ:WZCicaps09, We propose an online algorithm for planning under uncertainty in multi-agent settings modeled as DEC-POMDPs. The algorithm helps overcome the high computational complexity of solving such problems off-line. The key challenge is to produce coordinated behavior using little or no communication. When communication is allowed but constrained, the challenge is to produce high value with minimal communication. The algorithm addresses these challenges by communicating only when history inconsistency is detected, allowing communication to be postponed if necessary. Moreover, it bounds the memory usage at each step and can be applied to problems with arbitrary horizons. The experimental results confirm that the algorithm can solve problems that are too large for the best existing off-line planning algorithms and it outperforms the best online method, producing higher value with much less communication in most cases. |
Allen, Martin; Zilberstein, Shlomo Complexity of Decentralized Control: Special Cases Conference Proceedings of the 23rd Neural Information Processing Systems Conference (NIPS), Vancouver, British Columbia, Canada, 2009. @conference{SZ:AZnips09, The worst-case complexity of general decentralized POMDPs, which are equivalent to partially observable stochastic games (POSGs) is very high, both for the cooperative and competitive cases. Some reductions in complexity have been achieved by exploiting independence relations in some models. We show that these results are somewhat limited: when these independence assumptions are relaxed in very small ways, complexity returns to that of the general case. |
Goldman, Claudia V; Zilberstein, Shlomo Communication-Based Decomposition Mechanisms for Decentralized MDPs Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 32, pp. 169–202, 2008. @article{SZ:GZjair08, Multi-agent planning in stochastic environments can be framed formally as a decentralized Markov decision problem. Many real-life distributed problems that arise in manufacturing, multi-robot coordination and information gathering scenarios can be formalized using this framework. However, finding the optimal solution in the general case is hard, limiting the applicability of recently developed algorithms. This paper provides a practical approach for solving decentralized control problems when communication among the decision makers is possible, but costly. We develop the notion of communication-based mechanism that allows us to decompose a decentralized MDP into multiple single-agent problems. In this framework, referred to as decentralized semi-Markov decision process with direct communication (Dec-SMDP-Com), agents operate separately between communications. We show that finding an optimal mechanism is equivalent to solving optimally a Dec-SMDP-Com. We also provide a heuristic search algorithm that converges on the optimal decomposition. Restricting the decomposition to some specific types of local behaviors reduces significantly the complexity of planning. In particular, we present a polynomial-time algorithm for the case in which individual agents perform goal-oriented behaviors between communications. The paper concludes with an additional tractable algorithm that enables the introduction of human knowledge, thereby reducing the overall problem to finding the best time to communicate. Empirical results show that these approaches provide good approximate solutions. |
Seuken, Sven; Zilberstein, Shlomo Formal Models and Algorithms for Decentralized Decision Making under Uncertainty Journal Article In: Autonomous Agents and Multi-Agent Systems (JAAMAS), vol. 17, no. 2, pp. 190–250, 2008. @article{SZ:SZjaamas08, Multi-agent planning in stochastic environments can be framed formally as a decentralized Markov decision problem. Many real-life distributed problems that arise in manufacturing, multi-robot coordination and information gathering scenarios can be formalized using this framework. However, finding the optimal solution in the general case is hard, limiting the applicability of recently developed algorithms. This paper provides a practical approach for solving decentralized control problems when communication among the decision makers is possible, but costly. We develop the notion of communication-based mechanism that allows us to decompose a decentralized MDP into multiple single-agent problems. In this framework, referred to as decentralized semi-Markov decision process with direct communication (Dec-SMDP-Com), agents operate separately between communications. We show that finding an optimal mechanism is equivalent to solving optimally a Dec-SMDP-Com. We also provide a heuristic search algorithm that converges on the optimal decomposition. Restricting the decomposition to some specific types of local behaviors reduces significantly the complexity of planning. In particular, we present a polynomial-time algorithm for the case in which individual agents perform goal-oriented behaviors between communications. The paper concludes with an additional tractable algorithm that enables the introduction of human knowledge, thereby reducing the overall problem to finding the best time to communicate. Empirical results show that these approaches provide good approximate solutions. |
Petrik, Marek; Zilberstein, Shlomo A Successive Approximation Algorithm for Coordination Problems Conference Proceedings of the 10th International Symposium on Artificial Intelligence and Mathematics (ISAIM), Ft. Lauderdale, Florida, 2008. @conference{SZ:PZisaim08, Developing scalable coordination algorithms for multi-agent systems is a hard computational challenge. One useful approach, demonstrated by the Coverage Set Algorithm (CSA), exploits structured interaction to produce significant computational gains. Empirically, CSA exhibits very good anytime performance, but an error bound on the results has not been established. We reformulate the algorithm and derive an online error bound for approximate solutions. Moreover, we propose an effective way to automatically reduce the complexity of the interaction. Our experiments show that this is a promising approach to solve a broad class of decentralized decision problems. The general formulation used by the algorithm makes it both easy to implement and widely applicable to a variety of other AI problems. |
Carlin, Alan; Zilberstein, Shlomo Value-Based Observation Compression for DEC-POMDPs Conference Proceedings of the 7th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Estoril, Portugal, 2008. @conference{SZ:CZaamas08, Representing agent policies compactly is essential for improving the scalability of multi-agent planning algorithms. In this paper, we focus on developing a pruning technique that allows us to merge certain observations within agent policies, while minimizing loss of value. This is particularly important for solving finite-horizon decentralized POMDPs, where agent policies are represented as trees, and where the size of policy trees grows exponentially with the number of observations. We introduce a value-based observation compression technique that prunes the least valuable observations while maintaining an error bound on the value lost as a result of pruning. We analyze the characteristics of this pruning strategy and show empirically that it is effective. Thus, we use compact policies to obtain significantly higher values compared with the best existing DEC-POMDP algorithm. |
Carlin, Alan; Zilberstein, Shlomo Observation Compression in DEC-POMDP Policy Trees Conference AAMAS Workshop on Multi-Agent Sequential Decision Making in Uncertain Domains (MSDM), Estoril, Portugal, 2008, (Best Paper Award). @conference{SZ:CZmsdm08, Representing agent policies compactly is essential for improving the scalability of multi-agent planning algorithms. In this paper, we focus on developing a pruning technique that allows us to merge certain observations from agent policies, while minimizing the loss of value. This is particularly important for solving finite-horizon decentralized POMDPs, where agent policies are represented as trees, and where the size of policy trees grows exponentially with the number of observations. We introduce a value-based observation compression technique that prunes the least valuable observations while maintaining an error bound on the value lost as a result of pruning. We analyze the characteristics of this pruning strategy and show empirically that it is effective. Thus, we use compact policies to obtain significantly higher values compared with the best existing DEC-POMDP algorithm. |
Amato, Christopher; Zilberstein, Shlomo What's Worth Memorizing: Attribute-based Planning for DEC-POMDPs Conference ICAPS Workshop on Multiagent Planning, Sydney, Australia, 2008. @conference{SZ:AZmasplan08, Current algorithms for decentralized partially observable Markov decision processes (DEC-POMDPs) require a large amount of memory to produce high quality plans. To combat this, existing methods optimize a set of finite-state controllers with an arbitrary amount of fixed memory. While this works well for some problems, in general, scalability and solution quality remain limited. As an alternative, we propose remembering some attributes that summarize key aspects of an agent's action and observation history. These attributes are often simple to determine, provide a well-motivated choice of controller size and focus the solution search on important components of agent histories. We show that for a range of DEC-POMDPs such attribute-based representation improves plan quality and scalability. |
Goldman, Claudia V; Allen, Martin; Zilberstein, Shlomo Learning to Communicate in a Decentralized Environment Journal Article In: Autonomous Agents and Multi-Agent Systems (JAAMAS), vol. 15, no. 1, pp. 47–90, 2007. @article{SZ:GAZjaamas07, Learning to communicate is an emerging challenge in AI research. It is known that agents interacting in decentralized, stochastic environments can benefit from exchanging information. Multi-agent planning generally assumes that agents share a common means of communication; however, in building robust distributed systems it is important to address potential miscoordination resulting from misinterpretation of messages exchanged. This paper lays foundations for studying this problem, examining its properties analytically and empirically in a decision-theoretic context. We establish a formal framework for the problem, and identify a collection of necessary and sufficient properties for decision problems that allow agents to employ probabilistic updating schemes in order to learn how to interpret what others are communicating. Solving the problem optimally is often intractable, but our approach enables agents using different languages to converge upon coordination over time. Our experimental work establishes how these methods perform when applied to problems of varying complexity. |
Seuken, Sven; Zilberstein, Shlomo Memory-Bounded Dynamic Programming for DEC-POMDPs Conference Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI), Hyderabad, India, 2007. @conference{SZ:SZijcai07, Decentralized decision making under uncertainty has been shown to be intractable when each agent has different partial information about the domain. Thus, improving the applicability and scalability of planning algorithms is an important challenge. We present the first memory-bounded dynamic programming algorithm for finite-horizon decentralized POMDPs. A set of heuristics is used to identify relevant points of the infinitely large belief space. Using these belief points, the algorithm successively selects the best joint policies for each horizon. The algorithm is extremely efficient, having linear time and space complexity with respect to the horizon length. Experimental results show that it can handle horizons that are multiple orders of magnitude larger than what was previously possible, while achieving the same or better solution quality. These results significantly increase the applicability of decentralized decision-making techniques. |
Amato, Christopher; Bernstein, Daniel S; Zilberstein, Shlomo Optimizing Memory-Bounded Controllers for Decentralized POMDPs Conference Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (UAI), Vancouver, British Columbia, 2007. @conference{SZ:ABZuai07, We present a memory-bounded optimization approach for solving infinite-horizon decentralized POMDPs. Policies for each agent are represented by stochastic finite state controllers. We formulate the problem of optimizing these policies as a nonlinear program, leveraging powerful existing nonlinear optimization techniques for solving the problem. While existing solvers only guarantee locally optimal solutions, we show that our formulation produces higher quality controllers than the state-of-the-art approach. We also incorporate a shared source of randomness in the form of a correlation device to further increase solution quality with only a limited increase in space and time. Our experimental results show that nonlinear optimization can be used to provide high quality, concise solutions to decentralized decision problems under uncertainty. |
Seuken, Sven; Zilberstein, Shlomo Improved Memory-Bounded Dynamic Programming for Decentralized POMDPs Conference Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (UAI), Vancouver, British Columbia, 2007. @conference{SZ:SZuai07, Memory-Bounded Dynamic Programming (MBDP) has proved extremely effective in solving decentralized POMDPs with large horizons. We generalize the algorithm and improve its scalability by reducing the complexity with respect to the number of observations from exponential to polynomial. We derive error bounds on solution quality with respect to this new approximation and analyze the convergence behavior. To evaluate the effectiveness of the improvements, we introduce a new, larger benchmark problem. Experimental results show that despite the high complexity of decentralized POMDPs, scalable solution techniques such as MBDP perform surprisingly well. |
Allen, Martin; Zilberstein, Shlomo Agent Influence as a Predictor of Difficulty for Decentralized Problem-Solving Conference Proceedings of the 22nd Conference on Artificial Intelligence (AAAI), Vancouver, British Columbia, 2007. @conference{SZ:AZaaai07, We study the effect of problem structure on the practical performance of optimal dynamic programming for decentralized decision problems. It is shown that restricting agent influence over problem dynamics can make the problem easier to solve. Experimental results establish that agent influence correlates with problem difficulty: as the gap between the influence of different agents grows, problems tend to become much easier to solve. The measure thus provides a general-purpose, automatic characterization of decentralized problems, identifying those for which optimal methods are more or less likely to work. Such a measure is also of possible use as a heuristic in the design of algorithms that create task decompositions and control hierarchies in order to simplify multiagent problems. |
Petrik, Marek; Zilberstein, Shlomo Anytime Coordination Using Separable Bilinear Programs Conference Proceedings of the 22nd Conference on Artificial Intelligence (AAAI), Vancouver, British Columbia, 2007. @conference{SZ:PZaaai07, Developing scalable coordination algorithms for multi-agent systems is a hard computational challenge. One useful approach, demonstrated by the Coverage Set Algorithm (CSA), exploits structured interaction to produce significant computational gains. Empirically, CSA exhibits very good anytime performance, but an error bound on the results has not been established. We reformulate the algorithm and derive both online and offline error bounds for approximate solutions. Moreover, we propose an effective way to automatically reduce the complexity of the interaction. Our experiments show that this is a promising approach to solve a broad class of decentralized decision problems. The general formulation used by the algorithm makes it both easy to implement and widely applicable to a variety of other AI problems. |
Szer, Daniel; Charpillet, Francois; Zilberstein, Shlomo MAA*: A Heuristic Search Algorithm for Solving Decentralized POMDPs Conference Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence (UAI), Edinburgh, Scotland, 2005. @conference{SZ:SCZuai05, We present multi-agent A* (MAA*), the first complete and optimal heuristic search algorithm for solving decentralized partially-observable Markov decision problems (DEC- POMDPs) with finite horizon. The algorithm is suitable for computing optimal plans for a cooperative group of agents that operate in a stochastic environment such as multi-robot coordination, network traffic control, or distributed resource allocation. Solving such problems effectively is a major challenge in the area of planning under uncertainty. Our solution is based on a synthesis of classical heuristic search and decentralized control theory. Experimental results show that MAA* has significant advantages. We introduce an anytime variant of MAA* and conclude with a discussion of promising extensions such as an approach to solving infinite-horizon problems. |
Allen, Martin; Goldman, Claudia V; Zilberstein, Shlomo Language Learning in Multi-Agent Systems Conference Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI), Edinburgh, Scotland, 2005. @conference{SZ:AGZijcai05, We present the problem of learning to communicate in decentralized and stochastic environments, analyzing it formally in a decision-theoretic context and illustrating the concept experimentally. Our approach allows agents to converge upon coordinated communication and action over time. |
Bernstein, Daniel S; Hansen, Eric A; Zilberstein, Shlomo Bounded Policy Iteration for Decentralized POMDPs Conference Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI), Edinburgh, Scotland, 2005. @conference{SZ:BHZijcai05, We present a bounded policy iteration algorithm for infinite-horizon decentralized POMDPs. Policies are represented as joint stochastic finite-state controllers, which consist of a local controller for each agent. We also let a joint controller include a correlation device that allows the agents to correlate their behavior without exchanging information during execution, and show that this leads to improved performance. The algorithm uses a fixed amount of memory, and each iteration is guaranteed to produce a controller with value at least as high as the previous one for all possible initial state distributions. For the case of a single agent, the algorithm reduces to Poupart and Boutilier's bounded policy iteration for POMDPs. |