Research
We study a wide range of problems in artificial intelligence, automated planning and learning, autonomous systems, reasoning under uncertainty, multi-agent systems, and resource-bounded reasoning. We are particularly interested in the implications of uncertainty and limited computational resources on the design of autonomous agents. In most practical settings, it is not feasible or desirable to find the optimal action, making it necessary to resort to some form of approximate reasoning. This raises a fundamental question: what does it mean for an agent to be “rational” when it does not have enough knowledge or computational power to derive the best course of action? Our overall approach to this problem involves meta-level control mechanisms that reason explicitly about the cost of decision-making and can optimize the amount of deliberation (or “thinking”) an agent does before taking action. We have also developed new planning techniques for situations involving multiple decision makers operating in either collaborative or adversarial domains.
Human Compatible AI
Miura, Shuwa; Zilberstein, Shlomo Observer-Aware Planning with Implicit and Explicit Communication Conference Proceedings of the The 23rd International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Auckland, New Zealand, 2024. @conference{SZ:MZaamas24, This paper presents a computational model designed for planning both implicit and explicit communication of intentions, goals, and desires. Building upon previous research focused on implicit communication of intention via actions, our model seeks to strategically influence an observer’s belief using both the agent’s actions and explicit messages. We show that our proposed model can be considered to be a special case of general multi-agent problems with explicit communication under certain assumptions. Since the mental state of the observer depends on histories, computing a policy for the proposed model amounts to optimizing a non-Markovian objective, which we show to be intractable in the worst case. To mitigate this challenge, we propose a technique based on splitting domain and communication actions during planning. We conclude with experimental evaluations of the proposed approach that illustrate its effectiveness. |
Mahmud, Saaduddin; Vazquez-Chanlatte, Marcell; Witwicki, Stefan; Zilberstein, Shlomo Explaining the Behavior of POMDP-based Agents Through the Impact of Counterfactual Information Conference Proceedings of the The 23rd International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Auckland, New Zealand, 2024. @conference{SZ:MVWZaamas24, In this work, we consider AI agents operating in Partially Observable Markov Decision Processes (POMDPs)–a widely-used framework for sequential decision making with incomplete state information. Agents operating with partial information take actions not only to advance their underlying goals but also to seek information and reduce uncertainty. Despite rapid progress in explainable AI, research on separating information-driven vs. goal-driven behaviors remains sparse. To address this gap, we introduce a novel explanation generation framework called Sequential Information Probing (SIP), to investigate the direct impact of state information, or its absence, on agent behavior. To quantify the impact we also propose two metrics under this SIP framework called Value of Information (VoI) and Influence of Information (IoI). We then theoretically derive several properties of these metrics. Finally, we present several experiments, including a case study on an autonomous vehicle, that illustrate the efficacy of our method. |
Choudhury, Moumita; Saisubramanian, Sandhya; Zhang, Hao; Zilberstein, Shlomo Minimizing Negative Side Effects in Cooperative Multi-Agent Systems Using Distributed Coordination Conference Proceedings of the The 23rd International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Auckland, New Zealand, 2024. @conference{SZ:CSZZaamas24, Autonomous agents in real-world environments may encounter undesirable outcomes or negative side effects (NSEs) when working collaboratively alongside other agents. We frame the challenge of minimizing NSEs in a multi-agent setting as a lexicographic decentralized Markov decision process in which we assume independence of rewards and transitions with respect to the primary assigned tasks, but allowing negative side effects to create a form of dependence among the agents. We present a lexicographic Q-learning approach to mitigate the NSEs using human feedback models while maintaining near-optimality with respect to the assigned tasks–up to some given slack. Our empirical evaluation across two domains demonstrates that our collaborative approach effectively mitigates NSEs, outperforming non-collaborative methods. |
Choudhury, Moumita; Saisubramanian, Sandhya; Zhang, Hao; Zilberstein, Shlomo Minimizing Negative Side Effects in Cooperative Multi-Agent Systems Using Distributed Coordination Conference Proceedings of the The 37th International FLAIRS Conference, Miramar Beach, Florida, 2024. @conference{SZ:CSZZflairs24, Autonomous agents operating in real-world environments frequently encounter undesirable outcomes or negative side effects (NSEs) when working collaboratively alongside other agents. Even when agents can execute their primary task optimally when operating in isolation, their training may not account for potential negative interactions that arise in the presence of other agents. We frame the challenge of minimizing NSEs as a lexicographic decentralized Markov decision process in which we assume independence of rewards and transitions with respect to the primary assigned tasks, but recognize that addressing negative side effects creates a form of dependence among the agents. We present a lexicographic Q-learning approach to mitigate the NSEs using human feedback models while maintaining near-optimality with respect to the assigned tasks–up to some given slack. Our empirical evaluation across two domains demonstrates that our collaborative approach effectively mitigates NSEs, outperforming non-collaborative methods. |
Mahmud, Saaduddin; Nashed, Samer B.; Goldman, Claudia V.; Zilberstein, Shlomo Estimating Causal Responsibility for Explaining Autonomous Behavior Book Section In: Calvaresi, Davide (Ed.): International Workshop on Explainable and Transparent AI and Multi-Agent Systems (EXTRAAMAS), pp. 78–94, Springer, 2023. @incollection{SZ:MNGZextraamas23, There has been growing interest in causal explanations of stochastic, sequential decision-making systems. Structural causal models and causal reasoning offer several theoretical benefits when exact inference can be applied. Furthermore, users overwhelmingly prefer the resulting causal explanations over other state-of-the-art systems. In this work, we focus on one such method, MeanRESP, and its approximate versions that drastically reduce compute load and assign a responsibility score to each variable, which helps identify smaller sets of causes to be used as explanations. However, this method, and its approximate versions in particular, lack deeper theoretical analysis and broader empirical tests. To address these shortcomings, we provide three primary contributions. First, we offer several theoretical insights on the sample complexity and error rate of approximate MeanRESP. Second, we discuss several automated metrics for comparing explanations generated from approximate methods to those generated via exact methods. While we recognize the significance of user studies as the gold standard for evaluating explanations, our aim is to leverage the proposed metrics to systematically compare explanation-generation methods along important quantitative dimensions. Finally, we provide a more detailed discussion of MeanRESP and how its output under different definitions of responsibility compares to existing widely adopted methods that use Shapley values. |
Saisubramanian, Sandhya; Zilberstein, Shlomo; Kamar, Ece Avoiding Negative Side Effects due to Incomplete Knowledge of AI Systems Journal Article In: AI Magazine, vol. 42, no. 4, pp. 62–71, 2022. @article{SZ:SZKaimag22, Autonomous agents acting in the real-world often operate based on models that ignore certain aspects of the environment. The incompleteness of any given model – handcrafted or machine acquired – is inevitable due to practical limitations of any modeling technique for complex real-world settings. Due to the limited fidelity of its model, an agent’s actions may have unexpected, undesirable consequences during execution. Learning to recognize and avoid such negative side effects (NSEs) of an agent’s actions is critical to improve the safety and reliability of autonomous systems. Mitigating NSEs is an emerging research topic that is attracting increased attention due to the rapid growth in the deployment of AI systems and their broad societal impacts. This article provides a comprehensive overview of different forms of NSEs and the recent research efforts to address them. We identify key characteristics of NSEs, highlight the challenges in avoiding NSEs, and discuss recently developed approaches, contrasting their benefits and limitations. The article concludes with a discussion of open questions and suggestions for future research directions. |
Saisubramanian, Sandhya; Zilberstein, Shlomo; Kamar, Ece Avoiding Negative Side Effects of Autonomous Systems in the Open World Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 74, pp. 143–177, 2022. @article{SZ:SZKjair22, Autonomous systems that operate in the open world often use incomplete models of their environment. Model incompleteness is inevitable due to the practical limitations in precise model specification and data collection about open-world environments. Due to the limited fidelity of the model, agent actions may produce negative side effects (NSEs) when deployed. Negative side effects are undesirable, unmodeled effects of agent actions on the environment. NSEs are inherently challenging to identify at design time and may affect the reliability, usability and safety of the system. We present two complementary approaches to mitigate the NSE via: (1) learning from feedback, and (2) environment shaping. The solution approaches target settings with different assumptions and agent responsibilities. In learning from feedback, the agent learns a penalty function associated with a NSE. We investigate the efficiency of different feedback mechanisms, including human feedback and autonomous exploration. The problem is formulated as a multi-objective Markov decision process such that optimizing the agent’s assigned task is prioritized over mitigating NSE. A slack parameter denotes the maximum allowed deviation from the optimal expected reward for the agent’s task in order to mitigate NSE. In environment shaping, we examine how a human can assist an agent, beyond providing feedback, and utilize their broader scope of knowledge to mitigate the impacts of NSE. We formulate the problem as a human-agent collaboration with decoupled objectives. The agent optimizes its assigned task and may produce NSE during its operation. The human assists the agent by performing modest reconfigurations of the environment so as to mitigate the impacts of NSE, without affecting the agent’s ability to complete its assigned task. We present an algorithm for shaping and analyze its properties. Empirical evaluations demonstrate the trade-offs in the performance of different approaches in mitigating NSE in different settings. |
Miura, Shuwa; Wray, Kyle Hollins; Zilberstein, Shlomo Heuristic Search for SSPs with Lexicographic Preferences over Multiple Costs Conference Proceedings of the 15th Annual Symposium on Combinatorial Search (SOCS), Vienna, Austria, 2022. @conference{SZ:MWZsocs22, Real-world decision problems often involve multiple competing objectives. The Stochastic Shortest Path (SSP) with lexicographic preferences over multiple costs offers an expressive formulation for many practical problems. However, the existing solution methods either lack optimality guarantees or require costly computations over the entire state space. We propose the first heuristic search algorithm for this problem, based on the heuristic algorithm for Constrained SSPs. Our experiments show that our heuristic search algorithm can compute optimal policies while avoiding a large portion of the state space. We also analyze the theoretical properties of the problem, establishing the conditions under which SSPs with lexicographic preferences have a proper optimal policy. |
Svegliato, Justin; Nashed, Samer B; Zilberstein, Shlomo Ethically Compliant Sequential Decision Making Conference Proceedings of the 35th Conference on Artificial Intelligence (AAAI), 2021, (Distinguished Paper Award). @conference{SZ:SNZaaai21, Enabling autonomous systems to comply with an ethical theory is critical given their accelerating deployment in domains that impact society. While many ethical theories have been studied extensively in moral philosophy, they are still challenging to implement by developers who build autonomous systems. This paper proposes a novel approach for building ethically compliant autonomous systems that optimize completing a task while following an ethical framework. First, we introduce a definition of an ethically compliant autonomous system and its properties. Next, we offer a range of ethical frameworks for divine command theory, prima facie duties, and virtue ethics. Finally, we demonstrate the accuracy and usability of our approach in a set of autonomous driving simulations and a user study of planning and robotics experts. |
Miura, Shuwa; Cohen, Andrew L; Zilberstein, Shlomo Maximizing Legibility in Stochastic Environments Conference Proceedings of the 30th IEEE International Conference on Robot & Human Interactive Communication, (RO-MAN), Vancouver, BC, Canada, 2021. @conference{SZ:MCZroman21, Making an agent's intentions clear from its observed behavior is crucial for seamless human-agent interaction and for increased transparency and trust in AI systems. Existing methods that address this challenge and maximize legibility of behaviors are limited to deterministic domains. We develop a technique for maximizing legibility in stochastic environments and illustrate that using legibility as an objective improves interpretability of agent behavior in several scenarios. We provide initial empirical evidence that human subjects can better interpret legible behavior. |
Miura, Shuwa; Zilberstein, Shlomo A Unifying Framework for Observer-Aware Planning and its Complexity Conference Proceedings of the 37th Conference on Uncertainty in Artificial Intelligence (UAI), Virtual Event, 2021. @conference{SZ:MZuai21, Being aware of observers and the inferences they make about an agent's behavior is crucial for successful multi-agent interaction. Existing works on observer-aware planning use different assumptions and techniques to produce observer-aware behaviors. We argue that observer-aware planning, in its most general form, can be modeled as an Interactive POMDP (I-POMDP), which requires complex modeling and is hard to solve. Hence, we introduce a less complex framework for producing observer-aware behaviors called Observer-Aware MDP (OAMDP) and analyze its relationship to I-POMDP. We establish the complexity of OAMDPs and show that they can improve interpretability of agent behaviors in several scenarios. |
Rabiee, Sadegh; Basich, Connor; Wray, Kyle Hollins; Zilberstein, Shlomo; Biswas, Joydeep Competence-Aware Path Planning via Introspective Perception Journal Article In: CoRR, vol. abs/2109.13974, 2021. @article{SZ:SZarXiv21c, Robots deployed in the real world over extended periods of time need to reason about unexpected failures, learn to predict them, and to proactively take actions to avoid future failures. Existing approaches for competence-aware planning are either model-based, requiring explicit enumeration of known failure modes, or purely statistical, using state- and location-specific failure statistics to infer competence. We instead propose a structured model-free approach to competence-aware planning by reasoning about plan execution failures due to errors in perception, without requiring a-priori enumeration of failure modes or requiring location-specific failure statistics. We introduce competence-aware path planning via introspective perception (CPIP), a Bayesian framework to iteratively learn and exploit task-level competence in novel deployment environments. CPIP factorizes the competence-aware planning problem into two components. First, perception errors are learned in a model-free and location-agnostic setting via introspective perception prior to deployment in novel environments. Second, during actual deployments, the prediction of task-level failures is learned in a context-aware setting. Experiments in a simulation show that the proposed CPIP approach outperforms the frequentist baseline in multiple mobile robot tasks, and is further validated via real robot experiments in an environment with perceptually challenging obstacles and terrain. |
Nashed, Samer B; Svegliato, Justin; Zilberstein, Shlomo Ethically Compliant Planning within Moral Communities Conference Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (AIES), 2021. @conference{SZ:NSZaies21, Ethically compliant autonomous systems (ECAS) are the state-of- the-art for solving sequential decision-making problems under un- certainty while respecting constraints that encode ethical considerations. This paper defines a novel concept in the context of ECAS that is from moral philosophy, the moral community, which leads to a nuanced taxonomy of explicit ethical agents. We then propose new ethical frameworks that extend the applicability of ECAS to domains where a moral community is required. Next, we provide a formal analysis of the proposed ethical frameworks and conduct experiments that illustrate their differences. Finally, we discuss the implications of explicit moral communities that could shape research on standards and guidelines for ethical agents in order to better understand and predict common errors in their design and communicate their capabilities. |
Galhotra, Sainyam; Saisubramanian, Sandhya; Zilberstein, Shlomo Learning to Generate Fair Clusters from Demonstrations Conference Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (AIES), 2021. @conference{SZ:GSZaies21, Fair clustering is the process of grouping similar entities together, while satisfying a mathematically well-defined fairness metric as a constraint. Due to the practical challenges in precise model specification, the prescribed fairness constraints are often incomplete and act as proxies to the intended fairness requirement. Clustering with proxies may lead to biased outcomes when the system is deployed. We examine how to identify the intended fairness constraint for a problem based on limited demonstrations from an expert. Each demonstration is a clustering over a subset of the data. We present an algorithm to identify the fairness metric from demonstrations and generate clusters using existing off-the-shelf clustering techniques, and analyze its theoretical properties. To extend our approach to novel fairness metrics for which clustering algorithms do not currently exist, we present a greedy method for clustering. Additionally, we investigate how to generate interpretable solutions using our approach. Empirical evaluation on three real-world datasets demonstrates the effectiveness of our approach in quickly identifying the underlying fairness and interpretability constraints, which are then used to generate fair and interpretable clusters. |
Galhotra, Sainyam; Saisubramanian, Sandhya; Zilberstein, Shlomo Learning to Generate Fair Clusters from Demonstrations Journal Article In: CoRR, vol. abs/2102.03977, 2021. @article{SZ:GSZarXiv21a, Fair clustering is the process of grouping similar entities together, while satisfying a mathematically well-defined fairness metric as a constraint. Due to the practical challenges in precise model specification, the prescribed fairness constraints are often incomplete and act as proxies to the intended fairness requirement, leading to biased outcomes when the system is deployed. We examine how to identify the intended fairness constraint for a problem based on limited demonstrations from an expert. Each demonstration is a clustering over a subset of the data. We present an algorithm to identify the fairness metric from demonstrations and generate clusters using existing off-the-shelf clustering techniques, and analyze its theoretical properties. To extend our approach to novel fairness metrics for which clustering algorithms do not currently exist, we present a greedy method for clustering. Additionally, we investigate how to generate interpretable solutions using our approach. Empirical evaluation on three real-world datasets demonstrates the effectiveness of our approach in quickly identifying the underlying fairness and interpretability constraints, which are then used to generate fair and interpretable clusters. |
Saisubramanian, Sandhya; Zilberstein, Shlomo Mitigating Negative Side Effects via Environment Shaping Journal Article In: CoRR, vol. abs/2102.07017, 2021. @article{SZ:SZarXiv21b, Agents operating in unstructured environments often produce negative side effects (NSE), which are difficult to identify at design time. While the agent can learn to mitigate the side effects from human feedback, such feedback is often expensive and the rate of learning is sensitive to the agent's state representation. We examine how humans can assist an agent, beyond providing feedback, and exploit their broader scope of knowledge to mitigate the impacts of NSE. We formulate this problem as a human-agent team with decoupled objectives. The agent optimizes its assigned task, during which its actions may produce NSE. The human shapes the environment through minor reconfiguration actions so as to mitigate the impacts of the agent's side effects, without affecting the agent's ability to complete its assigned task. We present an algorithm to solve this problem and analyze its theoretical properties. Through experiments with human subjects, we assess the willingness of users to perform minor environment modifications to mitigate the impacts of NSE. Empirical evaluation of our approach shows that the proposed framework can successfully mitigate NSE, without affecting the agent's ability to complete its assigned task. |
Saisubramanian, Sandhya; Zilberstein, Shlomo Mitigating Negative Side Effects via Environment Shaping (Extended Abstract) Conference Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), 2021. @conference{SZ:SZaamas21, Agents operating in the open world often produce negative side effects (NSE), which are difficult to identify at design time. We examine how a human can assist an agent, beyond providing feedback, and exploit their broader scope of knowledge to mitigate the impacts of NSE. We formulate this problem as a human-agent team with decoupled objectives. The agent optimizes its assigned task, during which its actions may produce NSE. The human shapes the environment through minor reconfiguration actions so as to mitigate the impacts of agent's side effects, without significantly degrading agent performance. We present an algorithm to solve this problem. Empirical evaluation shows that the proposed framework can successfully mitigate NSE, without affecting the agent’s ability to complete its assigned task. |
Saisubramanian, Sandhya; Roberts, Shannon C; Zilberstein, Shlomo Understanding User Attitudes Towards Negative Side Effects of AI Systems Conference CHI Conference on Human Factors in Computing Systems, Late-Breaking Work, 2021. @conference{SZ:SRZchi21, Artificial Intelligence (AI) systems deployed in the open world may produce negative side effects—which are unanticipated, undesirable outcomes that occur in addition to the intended outcomes of the system’s actions. These negative side effects affect users directly or indirectly, by violating their preferences or altering their environment in an undesirable, potentially harmful, manner. While the existing literature has started to explore techniques to overcome the impacts of negative side effects in deployed systems, there has been no prior efforts to determine how users perceive and respond to negative side effects. We surveyed 183 participants to develop an understanding of user attitudes towards side effects and how side effects impact user trust in the system. The surveys targeted two domains: an autonomous vacuum cleaner and an autonomous vehicle, each with 183 respondents. The results indicate that users are willing to tolerate side effects that are not safety-critical but prefer to minimize them as much as possible. Furthermore, users are willing to assist the system in mitigating negative side effects by providing feedback and reconfiguring the environment. Trust in the system diminishes if it fails to minimize the impacts of negative side effects over time. These results support key fundamental assumptions in existing techniques and facilitate the development of new methods to overcome negative side effects of AI systems. |
Woolf, Beverly; Ghosh, Aritra; Lan, Andrew; Zilberstein, Shlomo; Juravich, Tom; Cohen, Andrew; Geho, Olivia AI-Enabled Training in Manufacturing Workforce Development Conference AAAI Spring Symposium on Artificial Intelligence in Manufacturing, 2020. @conference{SZ:WGLZJCGspring20, A highly productive workforce can evolve with the integration of digital devices, such as computer interfaces to operating machines, interconnected smart devices, and robots, in the workplace. However, this potential cannot be realized with the current state-of-the-art systems used to train workers. This problem is acute in manufacturing, where huge skills gaps are evident; most workers lack the necessary skills to operate or collaborate with autonomous systems. We propose to address this problem by using intelligent tutoring systems and worker data analysis. The worker data includes: i) fine-grained on-job performance data, ii) career path data containing the entire career paths of workers, and iii) job posting data over a long period of time indicating the required skills for each job. We will collect and analyze worker data and use it to drive new methods for training and reskilling workers. We detail ideas and tools to be developed by research in intelligent tutoring systems, data science, manufacturing, sociology, labor analysis, education, psychology, and economics. We also describe a convergent approach to developing effective, fair, and scalable software solutions and dynamic intelligent training. address = Stanford, California |
Renski, Henry; Smith-Doerr, Laurel; Wilkerson, Tiamba; Roberts, Shannon C; Zilberstein, Shlomo; Branch, Enobong H Racial Equity and the Future of Work Journal Article In: Technology| Architecture+ Design, vol. 4, no. 1, pp. 17–22, 2020. @article{SZ:RSWRZBtad20, |
Anytime Algorithms
Bhatia, Abhinav; Svegliato, Justin; Nashed, Samer B.; Zilberstein, Shlomo Proceedings of the 32nd International Conference on Automated Planning and Scheduling (ICAPS), Virtual Conference, 2022. @conference{SZ:BSNZicaps22, Anytime planning algorithms often have hyperparameters that can be tuned at runtime to optimize their performance. While work on metareasoning has focused on when to interrupt an anytime planner and act on the current plan, the scope of metareasoning can be expanded to tuning the hyperparameters of the anytime planner at runtime. This paper introduces a general, decision-theoretic metareasoning approach that optimizes both the stopping point and hyperparameters of any- time planning. We begin by proposing a generalization of the standard meta-level control problem for anytime algorithms. We then offer a meta-level control technique that monitors and controls an anytime algorithm using deep reinforcement learning. Finally, we show that our approach boosts performance on a common benchmark domain that uses anytime weighted A* to solve a range of heuristic search problems and a mobile robot application that uses RRT* to solve motion planning problems. |
Bhatia, Abhinav; Svegliato, Justin; Zilberstein, Shlomo On the Benefits of Randomly Adjusting Anytime Weighted A* Conference Proceedings of the 14th International Symposium on Combinatorial Search (SOCS), 2021. @conference{SZ:BSZsocs21, Anytime Weighted A*--an anytime heuristic search algorithm that uses a weight to scale the heuristic value of each node in the open list--has proven to be an effective way to manage the trade-off between solution quality and computation time in heuristic search. Finding the best weight, however, is challenging because it depends on not only the characteristics of the domain and the details of the instance at hand, but also the available computation time. We propose a randomized version of this algorithm, called Randomized Weighted A*, that randomly adjusts its weight at runtime and show a counterintuitive phenomenon: RWA* generally per- forms as well or better than AWA* with the best static weight on a range of benchmark problems. The result is a simple algorithm that is easy to implement and performs consistently well without any offline experimentation or parameter tuning. |
Svegliato, Justin; Wray, Kyle Hollins; Zilberstein, Shlomo Meta-Level Control of Anytime Algorithms with Online Performance Prediction Conference Proceedings of the 27th International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 2018. @conference{SZ:SWZijcai18, Anytime algorithms enable intelligent systems to trade computation time with solution quality. To exploit this crucial ability in real-time decision-making, the system must decide when to interrupt the anytime algorithm and act on the current solution. Existing meta-level control techniques, how- ever, address this problem by relying on significant offline work that diminishes their practical utility and accuracy. We formally introduce an online performance prediction framework that enables meta- level control to adapt to each instance of a problem without any preprocessing. Using this framework, we then present a meta-level control technique and two stopping conditions. Finally, we show that our approach outperforms existing techniques that re- quire substantial offline work. The result is efficient nonmyopic meta-level control that reduces the overhead and increases the benefits of using any- time algorithms in intelligent systems. |
Svegliato, Justin; Zilberstein, Shlomo Adaptive Metareasoning for Bounded Rational Agents Conference IJCAI/ECAI Workshop on Architectures and Evaluation for Generality, Autonomy and Progress in AI (AEGAP), Stockholm, Sweden, 2018. @conference{SZ:SZijcaiAEGAP18, In computational approaches to bounded rationality, metareasoning enables intelligent agents to optimize their own decision-making process in order to produce effective action in a timely manner. While there have been substantial efforts to develop effective meta-level control for anytime algorithms, existing techniques rely on extensive offline work, imposing several critical assumptions that diminish their effectiveness and limit their practical utility in the real world. In order to eliminate these assumptions, adaptive metareasoning enables intelligent agents to adapt to each individual instance of the problem at hand without the need for significant offline preprocessing. Building on our re- cent work, we first introduce a model-free approach to meta-level control based on reinforcement learn- ing. We then present a meta-level control technique that uses temporal difference learning. Finally, we show empirically that our approach is effective on a common benchmark in meta-level control. |
Arnt, Andrew; Zilberstein, Shlomo; Allan, James; Mouaddib, Abdel-Illah Dynamic Composition of Information Retrieval Techniques Journal Article In: Journal of Intelligent Information Systems (JIIS), vol. 23, no. 1, pp. 67–97, 2004. @article{SZ:AZAMjiis04, This paper presents a new approach to information retrieval (IR) based on run-time selection of the best set of techniques to respond to a given query. A technique is selected based on its projected effectiveness with respect to the specific query, the load on the system, and a time-dependent utility function. The paper examines two fundamental questions: (1) can the selection of the best IR techniques be performed at run-time with minimal computational overhead? and (2) is it possible to construct a reliable probabilistic model of the performance of an IR technique that is conditioned on the characteristics of the query? We show that both of these questions can be answered positively. These results suggest a new system design that carries a great potential to improve the quality of service of future IR systems. |
Zilberstein, Shlomo; Charpillet, Francois; Chassaing, Philippe Optimal Sequencing of Contract Algorithms Journal Article In: Annals of Mathematics and Artificial Intelligence (AMAI), vol. 39, no. 1-2, pp. 1-18, 2003. @article{SZ:ZCCamai03, We address the problem of building an interruptible real-time system using non-interruptible components. Some artificial intelligence techniques offer a tradeoff between computation time and quality of results, but their run-time must be determined when they are activated. These techniques, called contract algorithms, introduce a complex scheduling problem when there is uncertainty about the amount of time available for problem-solving. We show how to optimally sequence contract algorithms to create the best possible interruptible system with or without stochastic information about the deadline. These results extend the foundation of real-time problem-solving and provide useful guidance for embedding contract algorithms in applications. |
Bernstein, Daniel S; Finkelstein, Lev; Zilberstein, Shlomo Contract Algorithms and Robots on Rays: Unifying Two Scheduling Problems Conference Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI), Acapulco, Mexico, 2003. @conference{SZ:BFZijcai03, We study two apparently different, but formally similar, scheduling problems. The first problem involves contract algorithms, which can trade off run time for solution quality, as long as the amount of available run time is known in advance. The problem is to schedule contract algorithms to run on parallel processors, under the condition that an interruption can occur at any time, and upon interruption a solution to any one of a number of problems can be requested. Schedules are compared in terms of acceleration ratio, which is a worst-case measure of efficiency. We provide a schedule and prove its optimality among a particular class of schedules. Our second problem involves multiple robots searching for a goal on one of multiple rays. Search strategies are compared in terms of time-competitive ratio, the ratio of the total search time to the time it would take for one robot to traverse directly to the goal. We demonstrate that search strategies and contract schedules are formally equivalent. In addition, for our class of schedules, we derive a formula relating the acceleration ratio of a schedule to the time-competitive ratio of the corresponding search strategy. |
Bernstein, Daniel S; Perkins, Theodore J; Zilberstein, Shlomo; Finkelstein, Lev Scheduling Contract Algorithms on Multiple Processors Conference Proceedings of the 18th National Conference on Artificial Intelligence (AAAI), Edmonton, Alberta, 2002. @conference{SZ:BPZFaaai02, Anytime algorithms offer a tradeoff between computation time and the quality of the result returned. They can be divided into two classes: contract algorithms, for which the total run time must be specified in advance, and interruptible algorithms, which can be queried at any time for a solution. An interruptible algorithm can be constructed from a contract algorithm by repeatedly activating the contract algorithm with increasing run times. The acceleration ratio of a run-time schedule is a worst-case measure of how inefficient the constructed interruptible algorithm is compared to the contract algorithm. The smallest acceleration ratio achievable on a single processor is known. Using multiple processors, smaller acceleration ratios are possible. In this paper, we provide a schedule for m processors and prove that it is optimal for all m. Our results provide general guidelines for the use of parallel processors in the design of real-time systems. |
Hansen, Eric A; Zilberstein, Shlomo Monitoring and Control of Anytime Algorithms: A Dynamic Programming Approach Journal Article In: Artificial Intelligence (AIJ), vol. 126, no. 1-2, pp. 139–157, 2001. @article{SZ:HZaij01a, Anytime algorithms offer a tradeoff between solution quality and computation time that has proved useful in solving time-critical problems such as planning and scheduling, belief network evaluation, and information gathering. To exploit this tradeoff, a system must be able to decide when to stop deliberation and act on the currently available solution. This paper analyzes the characteristics of existing techniques for meta-level control of anytime algorithms and develops a new framework for monitoring and control. The new framework handles effectively the uncertainty associated with the algorithm's performance profile, the uncertainty associated with the domain of operation, and the cost of monitoring progress. The result is an efficient non-myopic solution to the meta-level control problem for anytime algorithms. |
Grass, Joshua; Zilberstein, Shlomo A Value-Driven System for Autonomous Information Gathering Journal Article In: Journal of Intelligent Information Systems (JIIS), vol. 14, no. 1, pp. 5–27, 2000. @article{SZ:GZjiis00, This paper presents a system for autonomous information gathering in an information rich domain under time and monetary resource restrictions. The system gathers information using an explicit representation of the user's decision model and a database of information sources. Information gathering actions (queries) are scheduled myopically by selecting the query with the highest marginal value. This value is determined by the value of the information with respect to the decision being made, the responsiveness of the information source, and a given resource cost function. Finally, we compare the value-driven approach to several base-line techniques and show that the overhead of the meta-level control is made up for by the increased decision quality. |
Zilberstein, Shlomo; Charpillet, Francois; Chassaing, Philippe Real-Time Problem-Solving with Contract Algorithms Conference Proceedings of the 16th International Joint Conference on Artificial Intelligence (IJCAI), Stockholm, Sweden, 1999. @conference{SZ:ZCCijcai99, This paper addresses the problem of building an interruptible real-time system using contract algorithms. Contract algorithms offer a tradeoff between computation time and quality of results, but their run-time must be determined when they are activated. Many AI techniques provide useful contract algorithms that are not interruptible. We show how to optimally sequence contract algorithms to create the best interruptible system with or without stochastic information about the deadline. These results extend the foundation of real-time problem-solving and provide useful guidance for embedding contract algorithms in applications. |
Marengoni, Mauricio; Hanson, Allen; Zilberstein, Shlomo; Riseman, Edward Control in a 3D Reconstruction System Using Selective Perception Conference Proceedings of the 7th IEEE International Conference on Computer Vision (ICCV), Kerkyra, Greece, 1999. @conference{SZ:MHZRiccv99, This paper presents a control structure for general purpose image understanding that addresses both the high level of uncertainty in local hypotheses and the computational complexity of image interpretation. The control of vision algorithms is performed by an independent subsystem that uses Bayesian networks and utility theory to compute the marginal value of information provided by alternative operators and selects the ones with the highest value. We have implemented and tested this control structure with several aerial image datasets. The results show that the knowledge base used by the system can be acquired using standard learning techniques and that the value-driven approach to the selection of vision algorithms leads to performance gains. Moreover, the modular system architecture simplifies the addition of both control knowledge and new vision algorithms. |
Hansen, Eric A; Zilberstein, Shlomo; Danilchenko, Victor A Anytime Heuristic Search: First Results Technical Report Computer Science Department, University of Massachussetts Amherst no. 97-50, 1997. @techreport{SZ:HZDtr9750, We describe a simple technique for converting heuristic search algorithms into anytime algorithms that offer a tradeoff between search time and solution quality. The technique is related to work on use of non-admissible evaluation functions that make it possible to find good, but possibly sub-optimal, solutions more quickly than it takes to find an optimal solution. Instead of stopping the search after the first solution is found, however, we continue the search in order to find a sequence of improved solutions that eventually converges to an optimal solution. The performance of anytime heuristic search depends on the non-admissible evaluation function that guides the search. We discuss how to design a search heuristic that "optimizes" the rate at which the currently available solution improves. |
Zilberstein, Shlomo; Russell, Stuart J Optimal Composition of Real-Time Systems Journal Article In: Artificial Intelligence (AIJ), vol. 82, no. 1-2, pp. 181–213, 1996. @article{SZ:ZRaij96, Real-time systems are designed for environments in which the utility of actions is strongly time-dependent. Recent work by Dean, Horvitz and others has shown that anytime algorithms are a useful tool for real-time system design, since they allow computation time to be traded for decision quality. In order to construct complex systems, however, we need to be able to compose larger systems from smaller, reusable anytime modules. This paper addresses two basic problems associated with composition: how to ensure the interruptibility of the composed system; and how to allocate computation time optimally among the components. The first problem is solved by a simple and general construction that incurs only a small, constant penalty. The second is solved by an off-line compilation process. We show that the general compilation problem is NP-complete. However, efficient local compilation techniques, working on a single program structure at a time, yield globally optimal allocations for a large class of programs. We illustrate these results with two simple applications. |
Zilberstein, Shlomo Using Anytime Algorithms in Intelligent Systems Journal Article In: AI Magazine, vol. 17, no. 3, pp. 73–83, 1996. @article{SZ:Zaimag96, Anytime algorithms give intelligent systems the capability to trade off deliberation time for quality of results. This capability is essential for successful operation in domains such as signal interpretation, real-time diagnosis and repair, and mobile robot control. What characterizes these domains is that it is not feasible (computationally) or desirable (economically) to compute the optimal answer. This paper surveys the main control problems that arise when a system is composed of several anytime algorithms. These problems relate to optimal management of uncertainty and precision. After a brief introduction to anytime computation, the paper outlines a wide range of existing solutions to the meta-level control problem and describes current work that is aimed at increasing the applicability of anytime computation. |
Hansen, Eric A; Zilberstein, Shlomo Monitoring the Progress of Anytime Problem-Solving Conference Proceedings of the 13th National Conference on Artificial Intelligence (AAAI), Portland, Oregon, 1996. @conference{SZ:HZaaai96, Anytime algorithms offer a tradeoff between solution quality and computation time that has proved useful in applying artificial intelligence techniques to time-critical problems. To exploit this tradeoff, a system must be able to determine the best time to stop deliberation and act on the currently available solution. When the rate of improvement of solution quality is uncertain, monitoring the progress of the algorithm can improve the utility of the system. This paper introduces a technique for run-time monitoring of anytime algorithms that is sensitive to the variance of the algorithm's performance, the time-dependent utility of a solution, the ability of the run-time monitor to estimate the quality of the currently available solution, and the cost of monitoring. The paper examines the conditions under which the technique is optimal and demonstrates its applicability. |
Zilberstein, Shlomo Optimizing Decision Quality with Contract Algorithms Conference Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI), Montreal, Canada, 1995. @conference{SZ:Zijcai95, Contract algorithms offer a tradeoff between output quality and computation time, provided that the amount of computation time is determined prior to their activation. Originally, they were introduced as an intermediate step in the composition of interruptible anytime algorithms. However, for many real-time tasks such as information gathering, game playing, and a large class of planning problems, contract algorithms offer an ideal mechanism to optimize decision quality. This paper extends previous results regarding the meta-level control of contract algorithms by handling a more general type of performance description. The output quality of each contract algorithm is described by a probabilistic (rather than deterministic) conditional performance profile. Such profiles map input quality and computation time to a probability distribution of output quality. The composition problem is solved by an efficient off-line compilation technique that simplifies the run-time monitoring task. |
Mouaddib, Abdel-illah; Zilberstein, Shlomo Knowledge-Based Anytime Computation Conference Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI), Montreal, Canada, 1995. @conference{SZ:MZijcai95, This paper describes a real-time decision-making model that combines the expressiveness and flexibility of knowledge-based systems with the real-time advantages of anytime algorithms. Anytime algorithms offer a simple means by which an intelligent system can trade off computation time for quality of results. Previous attempts to develop knowledge-based anytime algorithms failed to produce consistent, predictable improvement of quality over time. Without performance profiles, that describe the output quality as a function of time, it is hard to exploit the flexibility of anytime algorithms. The model of progressive reasoning that is presented here is based on a hierarchy of reasoning units that allow for gradual improvement of decision quality in a predictable manner. The result is an important step towards the application of knowledge-based systems in time-critical domains. |
Zilberstein, Shlomo Operational Rationality through Compilation of Anytime Algorithms PhD Thesis Computer Science Division, University of California Berkeley, 1993. @phdthesis{SZ:Zshort93, An important and largely ignored aspect of real-time decision making is the capability of agents to factor the cost of deliberation into the decision making process. I have developed an efficient model that creates this capability. The model uses as basic components anytime algorithms whose quality of results improves gradually as computation time increases. The main contribution of this work is a compilation process that extends the property of gradual improvement from the level of single algorithms to the level of complex systems. In standard algorithms, the fixed quality of the output allows for composition to be implemented by a simple call-return mechanism. However, when algorithms have resource allocation as a degree of freedom, there arises the question of how to construct, for example, the optimal composition of two anytime algorithms, one of which feeds its output to the other. This scheduling problem is solved by an off-line compilation process and a run-time monitoring component that together generate a utility maximizing behavior. The crucial meta-level knowledge is kept in the anytime library in the form of conditional performance profiles. These profiles characterize the performance of each elementary anytime algorithm as a function of run-time and input quality. The compilation process therefore extends the principles of procedural abstraction and modularity to anytime computation. Its efficiency is significantly improved by using local compilation that works on a single program structure at a time. Local compilation is proved to yield global optimality for a large set of program structures. Compilation produces contract algorithms which require the determination of the total run-time when activated. Some real-time domains require interruptible algorithms whose total run-time is unknown in advance. An important result of this work is a general method by which an interruptible algorithm can be constructed once a contract algorithm is compiled. Finally, the notion of gradual improvement of quality is extended to sensing and plan execution and the application of the model is demonstrated through a simulated robot navigation system. The result is a modular approach for developing real-time agents that act by performing anytime actions and make decisions using anytime computation. |
Zilberstein, Shlomo; Russell, Stuart J Anytime Sensing, Planning and Action: A Practical Model for Robot Control Conference Proceedings of the 13th International Joint Conference on Artificial Intelligence (IJCAI), Chambery, France, 1993. @conference{SZ:ZRijcai93, Anytime algorithms, whose quality of results improves gradually as computation time increases, provide useful performance components for time-critical planning and control of robotic systems. In earlier work, we introduced a compilation scheme for optimal composition of anytime algorithms. In this paper we present an implementation of a navigation system in which an off-line compilation process and a run-time monitoring component guarantee the optimal allocation of time to the anytime modules. The crucial meta-level knowledge is kept in the anytime library in the form of conditional performance profiles. We also extend the notion of gradual improvement to sensing and plan execution. The result is an efficient, flexible control for robotic systems that exploits the tradeoff between time and quality in planning, sensing and plan execution. |
Russell, Stuart J; Zilberstein, Shlomo Composing Real-Time Systems Conference Proceedings of the 12th International Joint Conference on Artificial Intelligence (IJCAI), Sydney, Australia, 1991. @conference{SZ:RZijcai91, We present a method to construct real-time systems using as components anytime algorithms whose quality of results degrades gracefully as computation time decreases. Introducing computation time as a degree of freedom defines a scheduling problem involving the activation and interruption of the anytime components. This scheduling problem is especially complicated when trying to construct interruptible algorithms, whose total run-time is unknown in advance. We introduce a framework to measure the performance of anytime algorithms and solve the problem of constructing interruptible algorithms by a mathematical reduction to the problem of constructing contract algorithms, which require the determination of the total run-time when activated. We show how the composition of anytime algorithms can be mechanized as part of a compiler for a LISP-like programming language for real-time systems. The result is a new approach to the construction of complex real-time systems that separates the arrangement of the performance components from the optimization of their scheduling, and automates the latter task. |
Models of Bounded Rationality
Svegliato, Justin; Basich, Connor; Saisubramanian, Sandhya; Zilberstein, Shlomo Metareasoning for Safe Decision Making in Autonomous Systems Conference Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Philadelphia, Pennsylvania, 2022. @conference{SZ:SBSZicra22, Although experts carefully specify the high-level decision-making models in autonomous systems, it is infeasible to guarantee safety across every scenario during operation. We therefore propose a safety metareasoning system that optimizes the severity of the system's safety concerns and the interference to the system's task: the system executes in parallel a task process that completes a specified task and safety processes that each address a specified safety concern with a conflict resolver for arbitration. This paper offers a formal definition of a safety metareasoning system, a recommendation algorithm for a safety process, an arbitration algorithm for a conflict resolver, an application of our approach to planetary rover exploration, and a demonstration that our approach is effective in simulation. |
Bhatia, Abhinav; Svegliato, Justin; Nashed, Samer B.; Zilberstein, Shlomo Proceedings of the 32nd International Conference on Automated Planning and Scheduling (ICAPS), Virtual Conference, 2022. @conference{SZ:BSNZicaps22, Anytime planning algorithms often have hyperparameters that can be tuned at runtime to optimize their performance. While work on metareasoning has focused on when to interrupt an anytime planner and act on the current plan, the scope of metareasoning can be expanded to tuning the hyperparameters of the anytime planner at runtime. This paper introduces a general, decision-theoretic metareasoning approach that optimizes both the stopping point and hyperparameters of any- time planning. We begin by proposing a generalization of the standard meta-level control problem for anytime algorithms. We then offer a meta-level control technique that monitors and controls an anytime algorithm using deep reinforcement learning. Finally, we show that our approach boosts performance on a common benchmark domain that uses anytime weighted A* to solve a range of heuristic search problems and a mobile robot application that uses RRT* to solve motion planning problems. |
Svegliato, Justin; Sharma, Prakhar; Zilberstein, Shlomo A Model-Free Approach to Meta-Level Control of Anytime Algorithms Conference Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 2020. @conference{SZ:SSZicra20, Anytime algorithms offer a trade-off between solution quality and computation time that has proven to be useful in autonomous systems for a wide range of real-time planning problems. In order to optimize this trade-off, an autonomous system has to solve a challenging meta-level control problem: it must decide when to interrupt the anytime algorithm and act on the current solution. Prevailing meta-level control techniques, however, make a number of unrealistic assumptions that reduce their effectiveness and usefulness in the real world. Eliminating these assumptions, we first introduce a model-free approach to meta-level control based on reinforcement learning and prove its optimality. We then offer a general meta-level control technique that can use different reinforcement learning methods. Finally, we show that our approach is effective across several common benchmark domains and a mobile robot domain. |
Carlin, Alan; Zilberstein, Shlomo Decentralized Monitoring of Distributed Anytime Algorithms Conference Proceedings of the 10th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Taipei, Taiwan, 2011. @conference{SZ:CZaamas11, Anytime algorithms allow a system to trade solution quality for computation time. In previous work, monitoring techniques have been developed to allow agents to stop the computation at the "right" time so as to optimize a given time-dependent utility function. However, these results apply only to the single-agent case. In this paper we analyze the problems that arise when several agents solve components of a larger problem, each using an anytime algorithm. Monitoring in this case is more challenging as each agent is uncertain about the progress made so far by the others. We develop a formal framework for decentralized monitoring, establish the complexity of several interesting variants of the problem, and propose solution techniques for each one. Finally, we show that the framework can be applied to decentralized flow and planning problems. |
Zilberstein, Shlomo Metareasoning and Bounded Rationality Book Section In: Cox, M; Raja, A (Ed.): Metareasoning: Thinking about Thinking, pp. 27–40, MIT Press, Cambridge, MA, USA, 2011. @incollection{SZ:Zmetareasoning11, |
Carling, Alan; Zilberstein, Shlomo Bounded Rationality in Multiagent Systems Using Decentralized Metareasoning Book Section In: Guy, T; Karny, M; Wolpert, D (Ed.): Decision Making with Imperfect Decision Makers, pp. 1–28, Springer, Berlin, Heidelberg, 2011. @incollection{SZ:CZdecisionmaking11, |
Petrik, Marek; Zilberstein, Shlomo Learning Parallel Portfolios of Algorithms Journal Article In: Annals of Mathematics and Artificial Intelligence (AMAI), vol. 48, no. 1-2, pp. 85–106, 2006. @article{SZ:PZamai06, A wide range of combinatorial optimization algorithms have been developed for complex reasoning tasks. Frequently, no single algorithm outperforms all the others. This has raised interest in leveraging the performance of a collection of algorithms to improve performance. We show how to accomplish this using a Parallel Portfolio of Algorithms (PPA). A PPA is a collection of diverse algorithms for solving a single problem, all running concurrently on a single processor until a solution is produced. The performance of the portfolio may be controlled by assigning different shares of processor time to each algorithm. We present an effective method for finding a PPA in which the share of processor time allocated to each algorithm is fixed. Finding the optimal static schedule is shown to be an NP-complete problem for a general class of utility functions. We present bounds on the performance of the PPA over random instances and evaluate the performance empirically on a collection of 23 state-of-the-art SAT algorithms. The results show significant performance gains over the fastest individual algorithm in the collection. |
Petrik, Marek; Zilberstein, Shlomo Learning Static Parallel Portfolios of Algorithms Conference Proceedings of the 9th International Symposium on Artificial Intelligence and Mathematics (ISAIM), Ft. Lauderdale, Florida, 2006. @conference{SZ:PZisaim06, We present an approach for improving the performance of combinatorial optimization algorithms by generating an optimal Parallel Portfolio of Algorithms (PPA). A PPA is a collection of diverse algorithms for solving a single problem, all running concurrently on a single processor until a solution is produced. The performance of the portfolio may be controlled by assigning different shares of processor time to each algorithm. We present a method for finding a static PPA, in which the share of processor time allocated to each algorithm is fixed. The schedule is shown to be optimal with respect to a given training set of instances. We draw bounds on the performance of the PPA over random instances and evaluate the performance empirically on a collection of 23 state-of-the-art SAT algorithms. The results show significant performance gains (up to a factor of 2) over the fastest individual algorithm in a realistic setting. |
Hansen, Eric A; Zilberstein, Shlomo Monitoring and Control of Anytime Algorithms: A Dynamic Programming Approach Journal Article In: Artificial Intelligence (AIJ), vol. 126, no. 1-2, pp. 139–157, 2001. @article{SZ:HZaij01a, Anytime algorithms offer a tradeoff between solution quality and computation time that has proved useful in solving time-critical problems such as planning and scheduling, belief network evaluation, and information gathering. To exploit this tradeoff, a system must be able to decide when to stop deliberation and act on the currently available solution. This paper analyzes the characteristics of existing techniques for meta-level control of anytime algorithms and develops a new framework for monitoring and control. The new framework handles effectively the uncertainty associated with the algorithm's performance profile, the uncertainty associated with the domain of operation, and the cost of monitoring progress. The result is an efficient non-myopic solution to the meta-level control problem for anytime algorithms. |
Cardon, Stephane; Mouaddib, Abdel-Illah; Zilberstein, Shlomo; Washington, Richard Adaptive Control of Acyclic Progressive Processing Task Structures Conference Proceedings of the 17th International Joint Conference on Artificial Intelligence (IJCAI), Seattle, Washington, 2001. @conference{SZ:CMZWijcai01, The progressive processing model allows a system to trade off resource consumption against the quality of the outcome by mapping each activity to a graph of potential solution methods. In the past, only semi-linear graphs have been used. We examine the application of the model to control the operation of an autonomous rover which operates under tight resource constraints. The task structure is generalized to directed acyclic graphs for which the optimal schedule can be computed by solving a corresponding Markov decision problem. We evaluate the complexity of the solution analytically and experimentally and show that it provides a practical approach to building an adaptive controller for this application. |
Zilberstein, Shlomo; Mouaddib, Abdel-Illah Optimal Scheduling of Progressive Processing Tasks Journal Article In: International Journal of Approximate Reasoning (IJAR), vol. 25, no. 3, pp. 169–186, 2000. @article{SZ:ZMijar00, LAO* is a heuristic search algorithm for Markov decision problems that is derived from the classic heuristic search algorithm AO* (Progressive processing is an approximate reasoning model that allows a system to satisfy a set of requests under time pressure by limiting the amount of processing allocated to each task based on a predefined hierarchical task structure. It is a useful model for a variety of real-time tasks such as information retrieval, automated diagnosis, or real-time image tracking and speech recognition. In performing these tasks it is often necessary to trade-off computational resources for quality of results. This paper addresses progressive processing of information retrieval requests that are characterized by high duration uncertainty associated with each computational unit and dynamic operation allowing new requests to be added at run-time. We introduce a new approach to scheduling the processing units by constructing and solving a particular Markov decision problem. The resulting policy is an optimal schedule for the progressive processing problem. Evaluation of the technique shows that it offers a significant improvement over existing heuristic scheduling techniques. Moreover, the framework presented in this paper can be applied to real-time scheduling of a wide variety of task structures other than progressive processing. |
Zilberstein, Shlomo; Mouaddib, Abdel-Illah Reactive Control of Dynamic Progressive Processing Conference Proceedings of the 16th International Joint Conference on Artificial Intelligence (IJCAI), Stockholm, Sweden, 1999. @conference{SZ:ZMijcai99, Progressive processing is a model of computation that allows a system to tradeoff computational resources against the quality of results. This paper generalizes the existing model to make it suitable for dynamic composition of information retrieval techniques. The new framework addresses effectively the uncertainty associated with the duration and output quality of each component. We show how to construct an optimal meta-level controller for a single task based on solving a corresponding Markov decision problem, and how to extend the solution to the case of multiple and dynamic tasks using the notion of an opportunity cost. |
Mouaddib, Abdel-Illah; Zilberstein, Shlomo Optimal Scheduling of Dynamic Progressive Processing Conference Proceedings of the 13th European Conference on Artificial Intelligence (ECAI), Brighton, UK, 1998, (Best Paper Award). @conference{SZ:MZecai98, Progressive processing allows a system to satisfy a set of requests under time pressure by limiting the amount of processing allocated to each task based on a predefined hierarchical task structure. It is a useful model for a variety of real-time AI tasks such as diagnosis and planning in which it is necessary to trade-off computational resources for quality of results. This paper addresses progressive processing of information retrieval requests that are characterized by high duration uncertainty associated with each computational unit and dynamic operation allowing new requests to be added at run-time. We introduce a new approach to scheduling the processing units by constructing and solving a particular Markov decision problem. The resulting policy is an optimal schedule for the progressive processing problem. Finally, we evaluate the technique and show that it offers a significant improvement over existing heuristic scheduling techniques. |
Mouaddib, Abdel-Illah; Zilberstein, Shlomo Handling Duration Uncertainty in Meta-Level Control of Progressive Processing Conference Proceedings of the 15th International Joint Conference on Artificial Intelligence (IJCAI), Nagoya, Japan, 1997. @conference{SZ:MZijcai97, Progressive processing is a resource-bounded reasoning technique that allows a system to incrementally construct a solution to a problem using a hierarchy of processing levels. This paper focuses on the problem of meta-level control of progressive processing in domains characterized by rapid change and high level of duration uncertainty. We show that progressive processing facilitates efficient run-time monitoring and meta-level control. Our solution is based on an incremental scheduler that can handle duration uncertainty by dynamically revising the schedule during execution time based on run-time information. We also show that a probabilistic representation of duration uncertainty reduces the frequency of schedule revisions and thus improves the performance of the system. Finally, an experimental evaluation shows the contributions of this approach and its suitability for a data transmission application. |
Zilberstein, Shlomo Resource-Bounded Sensing and Planning in Autonomous Systems Journal Article In: Autonomous Robots, vol. 3, pp. 31–48, 1996. @article{SZ:Zar96, This paper is concerned with the implications of limited computational resources and uncertainty on the design of autonomous systems. To address this problem, we redefine the principal role of sensor interpretation and planning processes. Following Agre and Chapman's plan-as-communication approach, sensing and planning are treated as computational processes that provide information to an execution architecture and thus improve the overall performance of the system. We argue that autonomous systems must be able to trade off the quality of this information with the computational resources required to produce it. Anytime algorithms, whose quality of results improves gradually as computation time increases, provide useful performance components for time-critical sensing and planning in robotic systems. In our earlier work, we introduced a compilation scheme for optimal composition of anytime algorithms. This paper demonstrates the applicability of the compilation technique to the construction of autonomous systems. The result is a flexible approach to construct systems that can operate robustly in real-time by exploiting the tradeoff between time and quality in planning, sensing and plan execution. |
Hansen, Eric A; Zilberstein, Shlomo Monitoring the Progress of Anytime Problem-Solving Conference Proceedings of the 13th National Conference on Artificial Intelligence (AAAI), Portland, Oregon, 1996. @conference{SZ:HZaaai96, Anytime algorithms offer a tradeoff between solution quality and computation time that has proved useful in applying artificial intelligence techniques to time-critical problems. To exploit this tradeoff, a system must be able to determine the best time to stop deliberation and act on the currently available solution. When the rate of improvement of solution quality is uncertain, monitoring the progress of the algorithm can improve the utility of the system. This paper introduces a technique for run-time monitoring of anytime algorithms that is sensitive to the variance of the algorithm's performance, the time-dependent utility of a solution, the ability of the run-time monitor to estimate the quality of the currently available solution, and the cost of monitoring. The paper examines the conditions under which the technique is optimal and demonstrates its applicability. |
Zilberstein, Shlomo Optimizing Decision Quality with Contract Algorithms Conference Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI), Montreal, Canada, 1995. @conference{SZ:Zijcai95, Contract algorithms offer a tradeoff between output quality and computation time, provided that the amount of computation time is determined prior to their activation. Originally, they were introduced as an intermediate step in the composition of interruptible anytime algorithms. However, for many real-time tasks such as information gathering, game playing, and a large class of planning problems, contract algorithms offer an ideal mechanism to optimize decision quality. This paper extends previous results regarding the meta-level control of contract algorithms by handling a more general type of performance description. The output quality of each contract algorithm is described by a probabilistic (rather than deterministic) conditional performance profile. Such profiles map input quality and computation time to a probability distribution of output quality. The composition problem is solved by an efficient off-line compilation technique that simplifies the run-time monitoring task. |
Zilberstein, Shlomo Operational Rationality through Compilation of Anytime Algorithms PhD Thesis Computer Science Division, University of California Berkeley, 1993. @phdthesis{SZ:Zshort93, An important and largely ignored aspect of real-time decision making is the capability of agents to factor the cost of deliberation into the decision making process. I have developed an efficient model that creates this capability. The model uses as basic components anytime algorithms whose quality of results improves gradually as computation time increases. The main contribution of this work is a compilation process that extends the property of gradual improvement from the level of single algorithms to the level of complex systems. In standard algorithms, the fixed quality of the output allows for composition to be implemented by a simple call-return mechanism. However, when algorithms have resource allocation as a degree of freedom, there arises the question of how to construct, for example, the optimal composition of two anytime algorithms, one of which feeds its output to the other. This scheduling problem is solved by an off-line compilation process and a run-time monitoring component that together generate a utility maximizing behavior. The crucial meta-level knowledge is kept in the anytime library in the form of conditional performance profiles. These profiles characterize the performance of each elementary anytime algorithm as a function of run-time and input quality. The compilation process therefore extends the principles of procedural abstraction and modularity to anytime computation. Its efficiency is significantly improved by using local compilation that works on a single program structure at a time. Local compilation is proved to yield global optimality for a large set of program structures. Compilation produces contract algorithms which require the determination of the total run-time when activated. Some real-time domains require interruptible algorithms whose total run-time is unknown in advance. An important result of this work is a general method by which an interruptible algorithm can be constructed once a contract algorithm is compiled. Finally, the notion of gradual improvement of quality is extended to sensing and plan execution and the application of the model is demonstrated through a simulated robot navigation system. The result is a modular approach for developing real-time agents that act by performing anytime actions and make decisions using anytime computation. |
Zilberstein, Shlomo; Russell, Stuart J Anytime Sensing, Planning and Action: A Practical Model for Robot Control Conference Proceedings of the 13th International Joint Conference on Artificial Intelligence (IJCAI), Chambery, France, 1993. @conference{SZ:ZRijcai93, Anytime algorithms, whose quality of results improves gradually as computation time increases, provide useful performance components for time-critical planning and control of robotic systems. In earlier work, we introduced a compilation scheme for optimal composition of anytime algorithms. In this paper we present an implementation of a navigation system in which an off-line compilation process and a run-time monitoring component guarantee the optimal allocation of time to the anytime modules. The crucial meta-level knowledge is kept in the anytime library in the form of conditional performance profiles. We also extend the notion of gradual improvement to sensing and plan execution. The result is an efficient, flexible control for robotic systems that exploits the tradeoff between time and quality in planning, sensing and plan execution. |
Scalable Algorithms for Probabilistic Reasoning
Miura, Shuwa; Zilberstein, Shlomo Observer-Aware Planning with Implicit and Explicit Communication Conference Proceedings of the The 23rd International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Auckland, New Zealand, 2024. @conference{SZ:MZaamas24, This paper presents a computational model designed for planning both implicit and explicit communication of intentions, goals, and desires. Building upon previous research focused on implicit communication of intention via actions, our model seeks to strategically influence an observer’s belief using both the agent’s actions and explicit messages. We show that our proposed model can be considered to be a special case of general multi-agent problems with explicit communication under certain assumptions. Since the mental state of the observer depends on histories, computing a policy for the proposed model amounts to optimizing a non-Markovian objective, which we show to be intractable in the worst case. To mitigate this challenge, we propose a technique based on splitting domain and communication actions during planning. We conclude with experimental evaluations of the proposed approach that illustrate its effectiveness. |
Bhatia, Abhinav; Nashed, Samer B.; Zilberstein, Shlomo RL3: Boosting Meta Reinforcement Learning via RL inside RL2 Conference NeurIPS Workshop on Generalized Planning (GenPlan), New Orleans, Louisiana, 2023. @conference{SZ:BNZgenplan23, Meta reinforcement learning (meta-RL) methods such as RL2 have emerged as promising approaches for learning data-efficient RL algorithms tailored to a given task distribution. However, these RL algorithms struggle with long-horizon tasks and out-of-distribution tasks since they rely on recurrent neural networks to pro- cess the sequence of experiences instead of summarizing them into general RL components such as value functions. Moreover, even transformers have a practical limit to the length of histories they can efficiently reason about before training and inference costs become prohibitive. In contrast, traditional RL algorithms are data-inefficient since they do not leverage domain knowledge, but they do converge to an optimal policy as more data becomes available. In this paper, we propose RL3, a principled hybrid approach that combines traditional RL and meta-RL by incorporating task-specific action-values learned through traditional RL as an input to the meta-RL neural network. We show that RL3 earns greater cumulative reward on long-horizon and out-of-distribution tasks compared to RL2, while maintaining the efficiency of the latter in the short term. Experiments are conducted on both custom and benchmark discrete domains from the meta-RL literature that exhibit a range of short-term, long-term, and complex dependencies. |
Mahmud, Saaduddin; Nashed, Samer B.; Goldman, Claudia V.; Zilberstein, Shlomo Estimating Causal Responsibility for Explaining Autonomous Behavior Book Section In: Calvaresi, Davide (Ed.): International Workshop on Explainable and Transparent AI and Multi-Agent Systems (EXTRAAMAS), pp. 78–94, Springer, 2023. @incollection{SZ:MNGZextraamas23, There has been growing interest in causal explanations of stochastic, sequential decision-making systems. Structural causal models and causal reasoning offer several theoretical benefits when exact inference can be applied. Furthermore, users overwhelmingly prefer the resulting causal explanations over other state-of-the-art systems. In this work, we focus on one such method, MeanRESP, and its approximate versions that drastically reduce compute load and assign a responsibility score to each variable, which helps identify smaller sets of causes to be used as explanations. However, this method, and its approximate versions in particular, lack deeper theoretical analysis and broader empirical tests. To address these shortcomings, we provide three primary contributions. First, we offer several theoretical insights on the sample complexity and error rate of approximate MeanRESP. Second, we discuss several automated metrics for comparing explanations generated from approximate methods to those generated via exact methods. While we recognize the significance of user studies as the gold standard for evaluating explanations, our aim is to leverage the proposed metrics to systematically compare explanation-generation methods along important quantitative dimensions. Finally, we provide a more detailed discussion of MeanRESP and how its output under different definitions of responsibility compares to existing widely adopted methods that use Shapley values. |
Nashed, Samer; Zilberstein, Shlomo A Survey of Opponent Modeling in Adversarial Domains Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 73, pp. 277–327, 2022. @article{SZ:NZjair22, Opponent modeling is the ability to use prior knowledge and observations in order to predict the behavior of an opponent. This survey presents a comprehensive overview of existing opponent modeling techniques for adversarial domains, many of which must address stochastic, continuous, or concurrent actions, and sparse, partially observable payoff structures. We discuss all the components of opponent modeling systems, including feature extraction, learning algorithms, and strategy abstractions. These discussions lead us to propose a new form of analysis for describing and predicting the evolution of game states over time. We then introduce a new framework that facilitates method comparison, analyze a representative selection of techniques using the proposed framework, and highlight common trends among recently proposed methods. Finally, we list several open problems and discuss future research directions inspired by AI research on opponent modeling and related research in other disciplines. |
Saisubramanian, Sandhya; Zilberstein, Shlomo; Kamar, Ece Avoiding Negative Side Effects of Autonomous Systems in the Open World Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 74, pp. 143–177, 2022. @article{SZ:SZKjair22, Autonomous systems that operate in the open world often use incomplete models of their environment. Model incompleteness is inevitable due to the practical limitations in precise model specification and data collection about open-world environments. Due to the limited fidelity of the model, agent actions may produce negative side effects (NSEs) when deployed. Negative side effects are undesirable, unmodeled effects of agent actions on the environment. NSEs are inherently challenging to identify at design time and may affect the reliability, usability and safety of the system. We present two complementary approaches to mitigate the NSE via: (1) learning from feedback, and (2) environment shaping. The solution approaches target settings with different assumptions and agent responsibilities. In learning from feedback, the agent learns a penalty function associated with a NSE. We investigate the efficiency of different feedback mechanisms, including human feedback and autonomous exploration. The problem is formulated as a multi-objective Markov decision process such that optimizing the agent’s assigned task is prioritized over mitigating NSE. A slack parameter denotes the maximum allowed deviation from the optimal expected reward for the agent’s task in order to mitigate NSE. In environment shaping, we examine how a human can assist an agent, beyond providing feedback, and utilize their broader scope of knowledge to mitigate the impacts of NSE. We formulate the problem as a human-agent collaboration with decoupled objectives. The agent optimizes its assigned task and may produce NSE during its operation. The human assists the agent by performing modest reconfigurations of the environment so as to mitigate the impacts of NSE, without affecting the agent’s ability to complete its assigned task. We present an algorithm for shaping and analyze its properties. Empirical evaluations demonstrate the trade-offs in the performance of different approaches in mitigating NSE in different settings. |
Rabiee, Sadegh; Basich, Connor; Wray, Kyle Hollins; Zilberstein, Shlomo; Biswas, Joydeep Competence-Aware Path Planning Via Introspective Perception Journal Article In: IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 3218–3225, 2022. @article{SZ:RBWZBlra22, Robots deployed in the real world over extended periods of time need to reason about unexpected failures, learn to predict them, and to proactively take actions to avoid future failures. Existing approaches for competence-aware planning are either model-based, requiring explicit enumeration of known failure sources, or purely statistical, using state- and location-specific failure statistics to infer competence. We instead propose a structured model-free approach to competence-aware planning by reasoning about plan execution failures due to errors in perception, without requiring a priori enumeration of failure sources or requiring location-specific failure statistics. We introduce competence-aware path planning via introspective perception (CPIP) , a Bayesian framework to iteratively learn and exploit task-level competence in novel deployment environments. CPIP factorizes the competence-aware planning problem into two components. First, perception errors are learned in a model-free and location-agnostic setting via introspective perception prior to deployment in novel environments. Second, during actual deployments, the prediction of task-level failures is learned in a context-aware setting. Experiments in a simulation show that the proposed CPIP approach outperforms the frequentist baseline in multiple mobile robot tasks, and is further validated via real robot experiments in environments with perceptually challenging obstacles and terrain. |
Svegliato, Justin; Basich, Connor; Saisubramanian, Sandhya; Zilberstein, Shlomo Metareasoning for Safe Decision Making in Autonomous Systems Conference Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Philadelphia, Pennsylvania, 2022. @conference{SZ:SBSZicra22, Although experts carefully specify the high-level decision-making models in autonomous systems, it is infeasible to guarantee safety across every scenario during operation. We therefore propose a safety metareasoning system that optimizes the severity of the system's safety concerns and the interference to the system's task: the system executes in parallel a task process that completes a specified task and safety processes that each address a specified safety concern with a conflict resolver for arbitration. This paper offers a formal definition of a safety metareasoning system, a recommendation algorithm for a safety process, an arbitration algorithm for a conflict resolver, an application of our approach to planetary rover exploration, and a demonstration that our approach is effective in simulation. |
Miura, Shuwa; Wray, Kyle Hollins; Zilberstein, Shlomo Heuristic Search for SSPs with Lexicographic Preferences over Multiple Costs Conference Proceedings of the 15th Annual Symposium on Combinatorial Search (SOCS), Vienna, Austria, 2022. @conference{SZ:MWZsocs22, Real-world decision problems often involve multiple competing objectives. The Stochastic Shortest Path (SSP) with lexicographic preferences over multiple costs offers an expressive formulation for many practical problems. However, the existing solution methods either lack optimality guarantees or require costly computations over the entire state space. We propose the first heuristic search algorithm for this problem, based on the heuristic algorithm for Constrained SSPs. Our experiments show that our heuristic search algorithm can compute optimal policies while avoiding a large portion of the state space. We also analyze the theoretical properties of the problem, establishing the conditions under which SSPs with lexicographic preferences have a proper optimal policy. |
Basich, Connor; Peterson, John; Zilberstein, Shlomo Planning with Intermittent State Observability: Knowing When to Act Blind Conference Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan, 2022. @conference{SZ:BPZiros22, Contemporary planning models and methods often rely on constant availability of free state information at each step of execution. However, autonomous systems are increasingly deployed in the open world where state information may be costly or simply unavailable in certain situations. Failing to account for sensor limitations may lead to costly behavior or even catastrophic failure. While the partially observable Markov decision process (POMDP) can be used to model this problem, solving POMDPs is often intractable. We introduce a planning model called a semi-observable Markov decision process (SOMDP) specifically designed for MDPs where state observability may be intermittent. We propose an approach for solving SOMDPs that uses memory states to proactively plan for the potential loss of sensor information while exploiting the unique structure of SOMDPs. Our theoretical analysis and empirical evaluation demonstrate the advantages of SOMDPs relative to existing planning models. |
Nashed, Samer B.; Svegliato, Justin; Bhatia, Abhinav; Russell, Stuart; Zilberstein, Shlomo Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan, 2022. @conference{SZ:NSBRZiros22, Markov decision processes (MDPs) are a common general-purpose model used in robotics for representing sequential decision-making problems. Given the complexity of robotics applications, a popular approach for approximately solving MDPs relies on state aggregation to reduce the size of the state space but at the expense of policy fidelity--offering a trade-off between policy quality and computation time. Naturally, this poses a challenging metareasoning problem: how can an autonomous system dynamically select different state abstractions that optimize this trade-off as it operates online? In this paper, we formalize this metareasoning problem with a notion of time-dependent utility and solve it using deep reinforcement learning. To do this, we develop several general, cheap heuristics that summarize the reward structure and transition topology of the MDP at hand to serve as effective features. Empirically, we demonstrate that our metareasoning approach outperforms several baseline approaches and a strong heuristic approach on a standard benchmark domain. |
Nashed, Samer B; Svegliato, Justin; Brucato, Matteo; Basich, Connor; Grupen, Roderic A; Zilberstein, Shlomo Solving Markov Decision Processes with Partial State Abstractions Conference Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2021. @conference{SZ:NSBBGZicra21, Autonomous systems often use approximate planners that exploit state abstractions to solve large MDPs in real-time decision-making problems. However, these planners can eliminate details needed to produce effective behavior in autonomous systems. We therefore propose a novel model, a partially abstract MDP, with a set of abstract states that each compress a set of ground states to condense irrelevant details and a set of ground states that expand from a set of grounded abstract states to retain relevant details. This paper offers (1) a definition of a partially abstract MDP that (2) generalizes its ground MDP and its abstract MDP and exhibits bounded optimality depending on its abstract MDP along with (3) a lazy algorithm for planning and execution in autonomous systems. The result is a scalable approach that computes near-optimal solutions to large problems in minutes rather than hours. |
Basich, Connor; Svegliato, Justin; Beach, Allyson; Wray, Kyle Hollins; Witwicki, Stefan J; Zilberstein, Shlomo Improving Competence via Iterative State Space Refinement Conference Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 2021. @conference{SZ:BSBWWZiros21, Despite considerable efforts by human designers, accounting for every unique situation that an autonomous robotic system deployed in the real world could face is often an infeasible task. As a result, many such deployed systems still rely on human assistance in various capacities to complete certain tasks while staying safe. Competence-aware systems (CAS) is a recently proposed model for reducing such reliance on human assistance while in turn optimizing the system’s global autonomous operation by learning its own competence. However, such systems are limited by a fixed model of their environment and may perform poorly if their a priori planning model does not include certain features that emerge as important over the course of the system’s deployment. In this paper, we propose a method for improving the competence of a CAS over time by identifying important state features missing from the system’s model and incorporating them into its state representation, thereby refining its state space. Our approach exploits information that exists in the standard CAS model and adds no extra work to the human. The result is an agent that better predicts human involvement, improving its competence, reliability, and overall performance. |
Parr, Shane; Khatri, Ishan; Svegliato, Justin; Zilberstein, Shlomo Agent-Aware State Estimation in Autonomous Vehicles Conference Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 2021. @conference{SZ:PKSZiros21, Autonomous systems often operate in environments where the behavior of multiple agents is coordinated by a shared global state. Reliable estimation of the global state is thus critical for successfully operating in a multi-agent setting. We introduce agent-aware state estimation--a framework for calculating indirect estimations of state given observations of the behavior of other agents in the environment. We also introduce transition-independent agent-aware state estimation--a tractable class of agent-aware state estimation--and show that it allows the speed of inference to scale linearly with the number of agents in the environment. As an example, we model traffic light classification in instances of complete loss of direct observation. By taking into account observations of vehicular behavior from multiple directions of traffic, our approach exhibits accuracy higher than that of existing traffic light-only HMM methods on a real-world autonomous vehicle data set under a variety of simulated occlusion scenarios. |
Basich, Connor; Wang, Daniel; Russino, Joseph; Chien, Steve; Zilberstein, Shlomo A Sampling-Based Optimization Approach to Handling Environmental Uncertainty for a Planetary Lander Conference ICAPS Workshop on Planning and Robotics (PlanRob), Guangzhou, China, 2021. @conference{SZ:BWRCZicaps21ws1, TBD. |
Miura, Shuwa; Cohen, Andrew L; Zilberstein, Shlomo Maximizing Legibility in Stochastic Environments Conference Proceedings of the 30th IEEE International Conference on Robot & Human Interactive Communication, (RO-MAN), Vancouver, BC, Canada, 2021. @conference{SZ:MCZroman21, Making an agent's intentions clear from its observed behavior is crucial for seamless human-agent interaction and for increased transparency and trust in AI systems. Existing methods that address this challenge and maximize legibility of behaviors are limited to deterministic domains. We develop a technique for maximizing legibility in stochastic environments and illustrate that using legibility as an objective improves interpretability of agent behavior in several scenarios. We provide initial empirical evidence that human subjects can better interpret legible behavior. |
Miura, Shuwa; Zilberstein, Shlomo A Unifying Framework for Observer-Aware Planning and its Complexity Conference Proceedings of the 37th Conference on Uncertainty in Artificial Intelligence (UAI), Virtual Event, 2021. @conference{SZ:MZuai21, Being aware of observers and the inferences they make about an agent's behavior is crucial for successful multi-agent interaction. Existing works on observer-aware planning use different assumptions and techniques to produce observer-aware behaviors. We argue that observer-aware planning, in its most general form, can be modeled as an Interactive POMDP (I-POMDP), which requires complex modeling and is hard to solve. Hence, we introduce a less complex framework for producing observer-aware behaviors called Observer-Aware MDP (OAMDP) and analyze its relationship to I-POMDP. We establish the complexity of OAMDPs and show that they can improve interpretability of agent behaviors in several scenarios. |
Pineda, Luis; Zilberstein, Shlomo Soft Labeling in Stochastic Shortest Path Problems Conference Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), Montreal, Quebec, CA, 2019. @conference{SZ:PZaamas19, The Stochastic Shortest Path (SSP) is an established model for goal-directed probabilistic planning. Despite its broad applicability, wide adoption of the model has been impaired by its high computational complexity. Efforts to address this challenge have produced promising algorithms that leverage two popular mechanisms: labeling and short-sightedness. The resulting algorithms can generate near- optimal solutions much faster than optimal solvers, albeit at the cost of poor theoretical guarantees. In this work, we introduce a generalization of labeling, called soft labeling, which results in a framework that encompasses a wide spectrum of efficient labeling algorithms, and offers better theoretical guarantees than existing short-sighted labeling approaches. We also propose a novel instantiation of this framework, the SOFT-FLARES algorithm, which achieves state-of-the-art performance on a diverse set of benchmarks. |
Saisubramanian, Sandhya; Wray, Kyle Hollins; Pineda, Luis Enrique; Zilberstein, Shlomo Planning in Stochastic Environments with Goal Uncertainty Conference ICAPS Workshop on Planning and Robotics (PlanRob), Berkeley, CA, 2019. @conference{SZ:SWPZicaps19ws1, TBD. |
Saisubramanian, Sandhya; Basich, Connor; Zilberstein, Shlomo; Goldman, Claudia V The Value of Incorporating Social Preferences in Dynamic Ridesharing Conference ICAPS Workshop on Scheduling and Planning Applications (SPARK), Berkeley, CA, 2019. @conference{SZ:SBZGicaps19ws2, TBD. |
Pineda, Luis Enrique; Zilberstein, Shlomo Probabilistic Planning with Reduced Models Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 65, pp. 271–306, 2019. @article{SZ:PZjair19, Reduced models are simplified versions of a given domain, designed to accelerate the planning process. Interest in reduced models has grown since the surprising success of determinization in the first international probabilistic planning competition, leading to the development of several enhanced determinization techniques. To address the drawbacks of previous determinization methods, we introduce a family of reduced models in which probabilistic outcomes are classified as one of two types: primary and exceptional. In each model that belongs to this family of reductions, primary outcomes can occur an unbounded number of times per trajectory, while exceptions can occur at most a finite number of times, specified by a parameter. Distinct reduced models are characterized by two parameters: the maximum number of primary outcomes per action, and the maximum number of occurrences of exceptions per trajectory. This family of reductions generalizes the well-known most-likely-outcome determinization approach, which includes one primary outcome per action and zero exceptional outcomes per plan. We present a framework to determine the benefits of planning with reduced models, and develop a continual planning approach that handles situations where the number of exceptions exceeds the specified bound during plan execution. Using this framework, we compare the performance of various reduced models and consider the challenge of generating good ones automatically. We show that each one of the dimensions--allowing more than one primary outcome or planning for some limited number of exceptions--could improve performance relative to standard determinization. The results place previous work on determinization in a broader context and lay the foundation for a systematic exploration of the space of model reductions. |
Saisubramanian, Sandhya; Wray, Kyle Hollins; Pineda, Luis Enrique; Zilberstein, Shlomo Planning in Stochastic Environments with Goal Uncertainty Conference Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 2019. @conference{SZ:SWPZiros19, We present the Goal Uncertain Stochastic Shortest Path (GUSSP) problem -- a general framework to model path planning and decision making in stochastic environments with goal uncertainty. The framework extends the stochastic shortest path (SSP) model to dynamic environments in which it is impossible to determine the exact goal states ahead of plan execution. GUSSPs introduce flexibility in goal specification by allowing a belief over possible goal configurations. The unique observations at potential goals helps the agent identify the true goal during plan execution. The partial observability is restricted to goals, facilitating the reduction to an SSP with a modified state space. We formally define a GUSSP and discuss its theoretical properties. We then propose an admissible heuristic that reduces the planning time using FLARES -- a start-of-the-art probabilistic planner. We also propose a determinization approach for solving this class of problems. Finally, we present empirical results on a search and rescue mobile robot and three other problem domains in simulation. |
Svegliato, Justin; Wray, Kyle Hollins; Witwicki, Stefan J; Biswas, Joydeep; Zilberstein, Shlomo Belief Space Metareasoning for Exception Recovery Conference Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 2019. @conference{SZ:SWWBZiros19, Due to the complexity of the real world, autonomous systems use decision-making models that rely on simplifying assumptions to make them computationally tractable and feasible to design. However, since these limited representations cannot fully capture the domain of operation, an autonomous system may encounter unanticipated scenarios that cannot be resolved effectively. We first formally introduce an introspective autonomous system that uses belief space metareasoning to recover from exceptions by interleaving a main decision process with a set of exception handlers. We then apply introspective autonomy to autonomous driving. Finally, we demonstrate that an introspective autonomous vehicle is effective in simulation and on a fully operational prototype. |
Saisubramanian, Sandhya; Zilberstein, Shlomo Adaptive Outcome Selection for Planning With Reduced Models Conference Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 2019. @conference{SZ:SZiros19, Reduced models allow autonomous robots to cope with the complexity of planning in stochastic environments by simplifying the model and reducing its accuracy. The solution quality of a reduced model depends on its fidelity. We present 0/1 reduced model that selectively improves model fidelity in certain states by switching between using a simplified deterministic model and the full model, without significantly compromising the run time gains. We measure the reduction impact for a reduced model based on the values of the ignored outcomes and use this as a heuristic for outcome selection. Finally, we present empirical results of our approach on three different domains, including an electric vehicle charging problem using real-world data from a university campus. |
Saisubramanian, Sandhya; Zilberstein, Shlomo; Shenoy, Prashant J Planning Using a Portfolio of Reduced Models Conference Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), Stockholm, Sweden, 2018. @conference{SZ:SZSaamas18, Existing reduced model techniques simplify a problem by applying a uniform principle to reduce the number of considered outcomes for all state-action pairs. It is non-trivial to identify which outcome selection principle will work well across all problem instances in a domain. We aim to create reduced models that yield near-optimal solutions, without compromising the run time gains of using a reduced model. First, we introduce planning using a portfolio of reduced models, a framework that provides flexibility in the reduced model formulation by using a portfolio of outcome selection principles. Second, we propose planning using cost adjustment, a technique that improves the solution quality by accounting for the outcomes ignored in the reduced model. Empirical evaluation of these techniques confirm their effectiveness in several domains. |
Srivastava, Siddharth; Desai, Nishant; Freedman, Richard G; Zilberstein, Shlomo An Anytime Algorithm for Task and Motion MDPs Conference ICAPS Workshop on Planning and Robotics (PlanRob), Delft, The Netherlands, 2018. @conference{SZ:SDFZicaps18ws1, Integrated task and motion planning has emerged as a challenging problem in sequential decision making, where a robot needs to compute high-level strategy and low-level motion plans for solving complex tasks. While high-level strategies require decision making over longer time-horizons and scales, their feasibility depends on low-level constraints based upon the geometries and continuous dynamics of the environment. The hybrid nature of this problem makes it difficult to scale; most existing approaches focus on deterministic, fully observable scenarios. We present a new approach where the high-level decision problem occurs in a stochastic setting and can be modeled as a Markov decision process. In contrast to prior efforts, we show that complete MDP policies, or contingent behaviors, can be computed effectively in an anytime fashion. Our algorithm continuously improves the quality of the solution and is guaranteed to be probabilistically complete. We evaluate the performance of our approach on a challenging, realistic test problem: autonomous aircraft inspection. Our results show that we can effectively compute consistent task and motion policies for the most likely execution-time outcomes using only a fraction of the computation required to develop the complete task and motion policy. |
Pineda, Luis Enrique; Wray, Kyle Hollins; Zilberstein, Shlomo Fast SSP Solvers Using Short-Sighted Labeling Conference Proceedings of the 31st Conference on Artificial Intelligence (AAAI), San Francisco, California, 2017. @conference{SZ:PWZaaai17, State-of-the-art methods for solving SSPs often work by limiting planning to restricted regions of the state space. The resulting problems can then be solved quickly, and the pro- cess is repeated during execution when states outside the restricted region are encountered. Typically, these approaches focus on states that are within some distance measure of the start state (e.g., number of actions or probability of being reached). However, these short-sighted approaches make it difficult to propagate information from states that are closer to a goal than to the start state, thus missing opportunities to improve planning. We present an alternative approach in which short-sightedness is used only to determine whether a state should be labeled as solved or not, but otherwise the set of states that can be accounted for during planning is unrestricted. Based on this idea, we propose the FLARES algorithm and show that it performs consistently well on a wide range of benchmark problems. |
Wray, Kyle Hollins; Zilberstein, Shlomo; Mouaddib, Abdel-Illah Multi-Objective MDPs with Conditional Lexicographic Reward Preferences Conference Proceedings of the 29th Conference on Artificial Intelligence (AAAI), Austin, Texas, 2015. @conference{SZ:WZMaaai15, Sequential decision problems that involve multiple objectives are prevalent. Consider for example a driver of a semi-autonomous car who may want to optimize competing objectives such as travel time and the effort associated with manual driving. We introduce a rich model called Lexicographic MDP (LMDP) and a corresponding planning algorithm called LVI that generalize previous work by allowing for conditional lexicographic preferences with slack. We analyze the convergence characteristics of LVI and establish its game theoretic properties. The performance of LVI in practice is tested within a realistic benchmark problem in the domain of semi-autonomous driving. Finally, we demonstrate how GPU-based optimization can improve the scalability of LVI and other value iteration algorithms for MDPs. |
Pineda, Luis; Wray, Kyle Hollins; Zilberstein, Shlomo Revisiting Multi-Objective MDPs with Relaxed Lexicographic Preferences Conference AAAI Fall Symposium on Sequential Decision Making for Intelligent Agents (SDMIA), Arlington, Virginia, 2015. @conference{SZ:PWZfall15, We consider stochastic planning problems that involve multiple objectives such as minimizing task completion time and energy consumption. These problems can be modeled as multi-objective Markov decision processes (MOMDPs), an extension of the widely-used MDP model to handle problems involving multiple value functions. We focus on a subclass of MOMDPs in which the objectives have a relaxed lexicographic structure, allowing an agent to seek improvement in a lower-priority objective when the impact on a higher- priority objective is within some small given tolerance. We examine the relationship between this class of problems and constrained MDPs, showing that the latter offer an alternative solution method with strong guarantees. We show empirically that a recently introduced algorithm for MOMDPs may not offer the same strong guarantees, but it does perform well in practice. |
Pineda, Luis; Zilberstein, Shlomo Planning Under Uncertainty Using Reduced Models: Revisiting Determinization Conference Proceedings of the 24th International Conference on Automated Planning and Scheduling (ICAPS), Portsmouth, New Hampshire, 2014. @conference{SZ:PZicaps14, We introduce a family of MDP reduced models characterized by two parameters: the maximum number of primary outcomes per action that are fully accounted for and the maximum number of occurrences of the remaining exceptional outcomes that are planned for in advance. Reduced models can be solved much faster using heuristic search algorithms such as LAO*, benefiting from the dramatic reduction in the number of reachable states. A commonly used determinization approach is a special case of this family of reductions, with one primary outcome per action and zero exceptional outcomes per plan. We present a framework to compute the benefits of planning with reduced models, relying on online planning when the number of exceptional outcomes exceeds the bound. Using this framework, we compare the performance of various reduced models and consider the challenge of generating good ones automatically. We show that each one of the dimensions--allowing more than one primary outcome or planning for some limited number of exceptions--could improve performance relative to standard determinization. The results place recent work on determinization in a broader context and lay the foundation for efficient and systematic exploration of the space of MDP model reductions. |
Pineda, Luis; Lu, Yi; Zilberstein, Shlomo; Goldman, Claudia V Fault-Tolerant Planning Under Uncertainty Conference Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI), Beijing, China, 2013. @conference{SZ:PLZGijcai13, A fault represents some erroneous operation of a system that could result from an action selection error or some abnormal condition. We formally define error models that characterize the likelihood of various faults and consider the problem of fault-tolerant planning, which optimizes performance given an error model. We show that factoring the possibility of errors significantly degrades the performance of stochastic planning algorithms such as LAO*, because the number of reachable states grows dramatically. We introduce an approach to plan for a bounded number of faults and analyze its theoretical properties. When combined with a continual planning paradigm, the k-fault-tolerant planning method can produce near-optimal performance, even when the number of faults exceeds the bound. Empirical results in two challenging domains confirm the effectiveness of the approach in handling different types of runtime errors. |
Petrik, Marek; Zilberstein, Shlomo Robust Approximate Bilinear Programming for Value Function Approximation Journal Article In: Journal of Machine Learning Research (JMLR), vol. 12, pp. 3027–3063, 2011. @article{SZ:PZjmlr11, Value function approximation methods have been successfully used in many applications, but the prevailing techniques often lack useful a priori error bounds. We propose a new approximate bilinear programming formulation of value function approximation, which employs global optimization. The formulation provides strong a priori guarantees on both robust and expected policy loss by minimizing specific norms of the Bellman residual. Solving a bilinear program optimally is NP-hard, but this worst-case complexity is unavoidable because the Bellman-residual minimization itself is NP-hard. We describe and analyze the formulation as well as a simple approximate algorithm for solving bilinear programs. The analysis shows that this algorithm offers a convergent generalization of approximate policy iteration. We also briefly analyze the behavior of bilinear programming algorithms under incomplete samples. Finally, we demonstrate that the proposed approach can consistently minimize the Bellman residual on simple benchmark problems. |
Petrik, Marek; Zilberstein, Shlomo Linear Dynamic Programs for Resource Management Conference Proceedings of the 25th Conference on Artificial Intelligence (AAAI), San Francisco, California, 2011. @conference{SZ:PZaaai11, Sustainable resource management in many domains presents large continuous stochastic optimization prob- lems, which can often be modeled as Markov decision processes (MDPs). To solve such large MDPs, we identify and leverage linearity in state and action sets that is common in resource management. In particular, we introduce linear dynamic programs(LDPs) that generalize resource management problems and partially observable MDPs (POMDPs). We show that the LDP framework makes it possible to adapt point-based methods -- the state of the art in solving POMDPs -- to solving LDPs. The experimental results demonstrate the efficiency of this approach in managing the water level of a river reservoir. Finally, we discuss the relationship with dual dynamic programming, a method used to optimize hydroelectric systems. |
Wu, Xiaojian; Kumar, Akshat; Zilberstein, Shlomo Influence Diagrams with Memory States: Representation and Algorithms Conference Proceedings of the 2nd International Conference on Algorithmic Decision Theory (ADT), Piscataway, New Jersey, 2011. @conference{SZ:WKZadt11, Influence diagrams (IDs) offer a powerful framework for decision making under uncertainty, but their applicability has been hindered by the exponential growth of runtime and memory usage--largely due to the no-forgetting assumption. We present a novel way to maintain a limited amount of memory to inform each decision and still obtain near-optimal policies. The approach is based on augmenting the graphical model with memory states that represent key aspects of previous observations--a method that has proved useful in POMDP solvers. We also derive an efficient EM-based message-passing algorithm to compute the policy. Experimental results show that this approach produces high-quality approximate polices and offers better scalability than existing methods. |
Petrik, Marek; Taylor, Gavin; Parr, Ron; Zilberstein, Shlomo Feature Selection Using Regularization in Approximate Linear Programs for Markov Decision Processes Conference Proceedings of the 27th International Conference on Machine Learning (ICML), Haifa, Israel, 2010. @conference{SZ:PTPZicml10, Approximate dynamic programming has been used successfully in a large variety of domains, but it relies on a small set of provided approximation features to calculate solutions reliably. Large and rich sets of features can cause existing algorithms to overfit because of a limited number of samples. We address this shortcoming using L1 regularization in approximate linear programming. Because the proposed method can automatically select the appropriate richness of features, its performance does not degrade with an increasing number of features. These results rely on new and stronger sampling bounds for regularized approximate linear programs. We also propose a computationally efficient homotopy method. The empirical evaluation of the approach shows that the proposed method performs well on simple MDPs and standard benchmark problems. |
Petrik, Marek; Zilberstein, Shlomo Constraint Relaxation in Approximate Linear Programs Conference Proceedings of the 26th International Conference on Machine Learning (ICML), Montreal, Canada, 2009. @conference{SZ:PZicml09, Approximate Linear Programming (ALP) is a reinforcement learning technique with nice theoretical properties, but it often performs poorly in practice. We identify some reasons for the poor quality of ALP solutions in problems where the approximation induces virtual loops. We then introduce two methods for improving solution quality. One method rolls out selected constraints of the ALP, guided by the dual information. The second method is a relaxation of the ALP, based on external penalty methods. The latter method is applicable in domains in which rolling out constraints is impractical. Both approaches show promising empirical results for simple benchmark problems as well as for a realistic blood inventory management problem. |
Petrik, Marek; Zilberstein, Shlomo Robust Value Function Approximation Using Bilinear Programming Conference Proceedings of the 23rd Neural Information Processing Systems Conference (NIPS), Vancouver, British Columbia, Canada, 2009. @conference{SZ:PZnips09, Existing value function approximation methods have been successfully used in many applications, but they often lack useful a priori error bounds. We propose approximate bilinear programming, a new formulation of value function approximation that provides strong a priori guarantees. In particular, this approach provably finds an approximate value function that minimizes the Bellman residual. Solving a bilinear program optimally is NP-hard, but this is unavoidable because the Bellman-residual minimization itself is NP-hard. We therefore employ and analyze a common approximate algorithm for bilinear programs. The analysis shows that this algorithm offers a convergent generalization of approximate policy iteration. Finally, we demonstrate that the proposed approach can consistently minimize the Bellman residual on a simple benchmark problem. |
Petrik, Marek; Zilberstein, Shlomo Average-Reward Decentralized Markov Decision Processes Conference Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI), Hyderabad, India, 2007. @conference{SZ:PZijcai07, Formal analysis of decentralized decision making has become a thriving research area in recent years, producing a number of multi-agent extensions of Markov decision processes. While much of the work has focused on optimizing discounted cumulative reward, optimizing average reward is sometimes a more suitable criterion. We formalize a class of such problems and analyze its characteristics, showing that it is NP complete and that optimal policies are deterministic. Our analysis lays the foundation for designing two optimal algorithms. Experimental results with a standard problem from the literature illustrate the applicability of these solution techniques. |
Feng, Zhengzhu; Hansen, Eric A; Zilberstein, Shlomo Symbolic Generalization for On-line Planning Conference Proceedings of the 19th Conference on Uncertainty in Artificial Intelligence (UAI), Acapulco, Mexico, 2003. @conference{SZ:FHZuai03, Symbolic representations have been used successfully in off-line planning algorithms for Markov decision processes. We show that they can also improve the performance of on-line planners. In addition to reducing computation time, symbolic generalization can reduce the amount of costly real-world interactions required for convergence. We introduce Symbolic Real-Time Dynamic Programming (or sRTDP), an extension of RTDP. After each step of on-line interaction with an environment, sRTDP uses symbolic model-checking techniques to generalizes its experience by updating a group of states rather than a single state. We examine two heuristic approaches to dynamic grouping of states and show that they accelerate the planning process significantly in terms of both CPU time and the number of steps of interaction with the environment. |
Hansen, Eric A; Zilberstein, Shlomo LAO*: A Heuristic Search Algorithm that Finds Solutions with Loops Journal Article In: Artificial Intelligence (AIJ), vol. 129, no. 1-2, pp. 35–62, 2001. @article{SZ:HZaij01bb, Classic heuristic search algorithms can find solutions that take the form of a simple path (A*), a tree, or an acyclic graph (AO*). In this paper, we describe a novel generalization of heuristic search, called LAO*, that can find solutions with loops. We show that LAO* can be used to solve Markov decision problems and that it shares the advantage heuristic search has over dynamic programming for other classes of problems: given a start state, it can find an optimal solution without evaluating the entire state space. |
Hansen, Eric A; Zilberstein, Shlomo Heuristic Search in Cyclic AND/OR Graphs Conference Proceedings of the 15th National Conference on Artificial Intelligence (AAAI), Madison, Wisconsin, 1998. @conference{SZ:HZaaai98, Heuristic search algorithms can find solutions that take the form of a simple path (A*), a tree or an acyclic graph (AO*). We present a novel generalization of heuristic search (called LAO*) that can find solutions with loops, that is, solutions that take the form of a cyclic graph. We show that it can be used to solve Markov decision problems without evaluating the entire state space, giving it an advantage over dynamic-programming algorithms such as policy iteration and value iteration as an approach to stochastic planning. |
Hansen, Eric A; Barto, Andrew G; Zilberstein, Shlomo Reinforcement Learning for Mixed Open-loop and Closed-loop Control Conference Proceedings of the 9th Neural Information Processing Systems Conference (NIPS), Denver, Colorado, 1996. @conference{SZ:HBZnips96, Closed-loop control relies on sensory feedback that is usually assumed to be free. But if sensing incurs a cost, it may be cost effective to take sequences of actions in open-loop mode. We describe a reinforcement learning algorithm that learns to combine open-loop and closed-loop control when sensing incurs a cost. Although we assume reliable sensors, use of open-loop control means that actions must sometimes be taken when the current state of the controlled system is uncertain. This is a special case of the hidden-state problem in reinforcement learning, and to cope, our algorithm relies on short-term memory. The main result of the paper is a rule that significantly limits exploration of possible memory states by pruning memory states for which the estimated value of information is greater than its cost. We prove that this rule allows convergence to an optimal policy. |
Belief-Space Planning and POMDPs
Miura, Shuwa; Zilberstein, Shlomo Observer-Aware Planning with Implicit and Explicit Communication Conference Proceedings of the The 23rd International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Auckland, New Zealand, 2024. @conference{SZ:MZaamas24, This paper presents a computational model designed for planning both implicit and explicit communication of intentions, goals, and desires. Building upon previous research focused on implicit communication of intention via actions, our model seeks to strategically influence an observer’s belief using both the agent’s actions and explicit messages. We show that our proposed model can be considered to be a special case of general multi-agent problems with explicit communication under certain assumptions. Since the mental state of the observer depends on histories, computing a policy for the proposed model amounts to optimizing a non-Markovian objective, which we show to be intractable in the worst case. To mitigate this challenge, we propose a technique based on splitting domain and communication actions during planning. We conclude with experimental evaluations of the proposed approach that illustrate its effectiveness. |
Mahmud, Saaduddin; Vazquez-Chanlatte, Marcell; Witwicki, Stefan; Zilberstein, Shlomo Explaining the Behavior of POMDP-based Agents Through the Impact of Counterfactual Information Conference Proceedings of the The 23rd International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Auckland, New Zealand, 2024. @conference{SZ:MVWZaamas24, In this work, we consider AI agents operating in Partially Observable Markov Decision Processes (POMDPs)–a widely-used framework for sequential decision making with incomplete state information. Agents operating with partial information take actions not only to advance their underlying goals but also to seek information and reduce uncertainty. Despite rapid progress in explainable AI, research on separating information-driven vs. goal-driven behaviors remains sparse. To address this gap, we introduce a novel explanation generation framework called Sequential Information Probing (SIP), to investigate the direct impact of state information, or its absence, on agent behavior. To quantify the impact we also propose two metrics under this SIP framework called Value of Information (VoI) and Influence of Information (IoI). We then theoretically derive several properties of these metrics. Finally, we present several experiments, including a case study on an autonomous vehicle, that illustrate the efficacy of our method. |
Basich, Connor; Peterson, John; Zilberstein, Shlomo Planning with Intermittent State Observability: Knowing When to Act Blind Conference Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan, 2022. @conference{SZ:BPZiros22, Contemporary planning models and methods often rely on constant availability of free state information at each step of execution. However, autonomous systems are increasingly deployed in the open world where state information may be costly or simply unavailable in certain situations. Failing to account for sensor limitations may lead to costly behavior or even catastrophic failure. While the partially observable Markov decision process (POMDP) can be used to model this problem, solving POMDPs is often intractable. We introduce a planning model called a semi-observable Markov decision process (SOMDP) specifically designed for MDPs where state observability may be intermittent. We propose an approach for solving SOMDPs that uses memory states to proactively plan for the potential loss of sensor information while exploiting the unique structure of SOMDPs. Our theoretical analysis and empirical evaluation demonstrate the advantages of SOMDPs relative to existing planning models. |
Basich, Connor; Svegliato, Justin; Beach, Allyson; Wray, Kyle Hollins; Witwicki, Stefan J; Zilberstein, Shlomo Improving Competence via Iterative State Space Refinement Conference Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 2021. @conference{SZ:BSBWWZiros21, Despite considerable efforts by human designers, accounting for every unique situation that an autonomous robotic system deployed in the real world could face is often an infeasible task. As a result, many such deployed systems still rely on human assistance in various capacities to complete certain tasks while staying safe. Competence-aware systems (CAS) is a recently proposed model for reducing such reliance on human assistance while in turn optimizing the system’s global autonomous operation by learning its own competence. However, such systems are limited by a fixed model of their environment and may perform poorly if their a priori planning model does not include certain features that emerge as important over the course of the system’s deployment. In this paper, we propose a method for improving the competence of a CAS over time by identifying important state features missing from the system’s model and incorporating them into its state representation, thereby refining its state space. Our approach exploits information that exists in the standard CAS model and adds no extra work to the human. The result is an agent that better predicts human involvement, improving its competence, reliability, and overall performance. |
Parr, Shane; Khatri, Ishan; Svegliato, Justin; Zilberstein, Shlomo Agent-Aware State Estimation in Autonomous Vehicles Conference Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 2021. @conference{SZ:PKSZiros21, Autonomous systems often operate in environments where the behavior of multiple agents is coordinated by a shared global state. Reliable estimation of the global state is thus critical for successfully operating in a multi-agent setting. We introduce agent-aware state estimation--a framework for calculating indirect estimations of state given observations of the behavior of other agents in the environment. We also introduce transition-independent agent-aware state estimation--a tractable class of agent-aware state estimation--and show that it allows the speed of inference to scale linearly with the number of agents in the environment. As an example, we model traffic light classification in instances of complete loss of direct observation. By taking into account observations of vehicular behavior from multiple directions of traffic, our approach exhibits accuracy higher than that of existing traffic light-only HMM methods on a real-world autonomous vehicle data set under a variety of simulated occlusion scenarios. |
Miura, Shuwa; Cohen, Andrew L; Zilberstein, Shlomo Maximizing Legibility in Stochastic Environments Conference Proceedings of the 30th IEEE International Conference on Robot & Human Interactive Communication, (RO-MAN), Vancouver, BC, Canada, 2021. @conference{SZ:MCZroman21, Making an agent's intentions clear from its observed behavior is crucial for seamless human-agent interaction and for increased transparency and trust in AI systems. Existing methods that address this challenge and maximize legibility of behaviors are limited to deterministic domains. We develop a technique for maximizing legibility in stochastic environments and illustrate that using legibility as an objective improves interpretability of agent behavior in several scenarios. We provide initial empirical evidence that human subjects can better interpret legible behavior. |
Miura, Shuwa; Zilberstein, Shlomo A Unifying Framework for Observer-Aware Planning and its Complexity Conference Proceedings of the 37th Conference on Uncertainty in Artificial Intelligence (UAI), Virtual Event, 2021. @conference{SZ:MZuai21, Being aware of observers and the inferences they make about an agent's behavior is crucial for successful multi-agent interaction. Existing works on observer-aware planning use different assumptions and techniques to produce observer-aware behaviors. We argue that observer-aware planning, in its most general form, can be modeled as an Interactive POMDP (I-POMDP), which requires complex modeling and is hard to solve. Hence, we introduce a less complex framework for producing observer-aware behaviors called Observer-Aware MDP (OAMDP) and analyze its relationship to I-POMDP. We establish the complexity of OAMDPs and show that they can improve interpretability of agent behaviors in several scenarios. |
Wray, Kyle Hollins; Zilberstein, Shlomo Generalized Controllers in POMDP Decision-Making Conference Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Montreal, Quebec, CA, 2019. @conference{SZ:WZicra19, We present a general policy formulation for partially observable Markov decision processes (POMDPs) called controller family policies that may be used as a framework to facilitate the design of new policy forms. We prove how modern approximate policy forms: point-based, finite state controller (FSC), and belief compression, are instances of this family of generalized controller policies. Our analysis provides a deeper understanding of the POMDP model and suggests novel ways to design POMDP solutions that can combine the benefits of different state-of-the-art methods. We illustrate this capability by creating a new customized POMDP policy form called the belief-integrated FSC (BI-FSC) tailored to overcome the shortcomings of a state-of-the-art algorithm that uses non-linear programming (NLP). Specifically, experiments show that for NLP the BI-FSC offers improved performance over a vanilla FSC-based policy form on benchmark domains. Furthermore, we demonstrate the BI-FSC's execution on a real robot navigating in a maze environment. Results confirm the value of using the controller family policy as a framework to design customized policies in POMDP robotic solutions. |
Saisubramanian, Sandhya; Wray, Kyle Hollins; Pineda, Luis Enrique; Zilberstein, Shlomo Planning in Stochastic Environments with Goal Uncertainty Conference Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 2019. @conference{SZ:SWPZiros19, We present the Goal Uncertain Stochastic Shortest Path (GUSSP) problem -- a general framework to model path planning and decision making in stochastic environments with goal uncertainty. The framework extends the stochastic shortest path (SSP) model to dynamic environments in which it is impossible to determine the exact goal states ahead of plan execution. GUSSPs introduce flexibility in goal specification by allowing a belief over possible goal configurations. The unique observations at potential goals helps the agent identify the true goal during plan execution. The partial observability is restricted to goals, facilitating the reduction to an SSP with a modified state space. We formally define a GUSSP and discuss its theoretical properties. We then propose an admissible heuristic that reduces the planning time using FLARES -- a start-of-the-art probabilistic planner. We also propose a determinization approach for solving this class of problems. Finally, we present empirical results on a search and rescue mobile robot and three other problem domains in simulation. |
Wray, Kyle Hollins; Zilberstein, Shlomo ICAPS Workshop on Planning and Robotics (PlanRob), Pittsburgh, Pennsylvania, 2017. @conference{SZ:WZplanrob17, Stochastic network design is a general framework for optimizing network connectivity. It has several applications in computational sustainability including spatial conservation planning, pre-disaster network preparation, and river net- work optimization. A common assumption in previous work has been made that network parameters (e.g., probability of species colonization) are precisely known, which is unrealistic in real-world settings. We therefore address the robust river network design problem where the goal is to optimize river connectivity for fish movement by removing barriers. We assume that fish passability probabilities are known only imprecisely, but are within some interval bounds. We then develop a planning approach that computes the policies with either high robust ratio or low regret. Empirically, our approach scales well to large river networks. We also provide insights into the solutions generated by our robust approach, which has significantly higher robust ratio than the baseline solution with mean parameter estimates. |
Wray, Kyle Hollins; Zilberstein, Shlomo Approximating reachable belief points in POMDPs Conference Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 2017. @conference{SZ:WZiros17, We propose an algorithm called ?-approximation that compresses the non-zero values of beliefs for partially observable Markov decision processes (POMDPs) in order to improve performance and reduce memory usage. Specifically, we approximate individual belief vectors with a fixed bound on the number of non-zero values they may contain. We prove the correctness and a strong error bound when the ?-approximation is used with the point-based value iteration (PBVI) family algorithms. An analysis compares the algorithm on six larger domains, varying the number of non-zero values for the ?-approximation. Results clearly demonstrate that when the algorithm used with PBVI (?-PBVI), we can achieve over an order of magnitude improvement. We ground our claims with a full robotic implementation for simultaneous navigation and localization using POMDPs with ?-PBVI. |
Wray, Kyle Hollins; Zilberstein, Shlomo A POMDP Formulation of Proactive Learning Conference Proceedings of the 30th Conference on Artificial Intelligence (AAAI), Phoenix, Arizona, 2016. @conference{SZ:WZaaai16, We cast the Proactive Learning (PAL) problem--Active Learning (AL) with multiple reluctant, fallible, cost-varying oracles--as a Partially Observable Markov Decision Process (POMDP). The agent selects an oracle at each time step to label a data point while it maintains a belief over the true underlying correctness of its current dataset's labels. The goal is to minimize labeling costs while considering the value of obtaining correct labels, thus maximizing final resultant classifier accuracy. We prove three properties that show our particular formulation leads to a structured and bounded-size set of belief points, enabling strong performance of point-based methods to solve the POMDP. Our method is compared with the original three algorithms proposed by Donmez and Carbonell and a simple baseline. We demonstrate that our approach matches or improves upon the original approach within five different oracle scenarios, each on two datasets. Finally, our algorithm provides a general, well-defined mathematical foundation to build upon. |
Kumar, Akshat; Zilberstein, Shlomo History-Based Controller Design and Optimization for Partially Observable MDPs Conference Proceedings of the 25th International Conference on Automated Planning and Scheduling (ICAPS), Jerusalem, Israel, 2015. @conference{SZ:KZicaps15, Partially observable MDPs provide an elegant framework for sequential decision making. Finite-state controllers (FSCs) are often used to represent policies for infinite-horizon problems as they offer a compact representation, simple-to-execute plans, and adjustable tradeoff between computational complexity and policy size. We develop novel connections between optimizing FSCs for POMDPs and the dual linear program for MDPs. Building on that, we present a dual mixed integer linear program (MIP) for optimizing FSCs. To assign well-defined meaning to FSC nodes as well as aid in policy search, we show how to associate history-based features with each FSC node. Using this representation, we address another challenging problem, that of iteratively deciding which nodes to add to FSC to get a better policy. Using an efficient off-the-shelf MIP solver, we show that this new approach can find compact near-optimal FSCs for several large benchmark domains, and is competitive with previous best approaches. |
Wray, Kyle Hollins; Zilberstein, Shlomo Multi-Objective POMDPs with Lexicographic Reward Preferences Conference Proceedings of the 24th International Joint Conference on Artificial Intelligence (IJCAI), Buenos Aires, Argentina, 2015. @conference{SZ:KZijcai15, We propose a model, Lexicographic Partially Observable Markov Decision Process (LPOMDP), which extends POMDPs with lexicographic preferences over multiple value functions. It allows for slack--slightly less-than-optimal values--for higher-priority preferences to facilitate improvement in lower-priority value functions. Many real life situations are naturally captured by LPOMDPs with slack. We consider a semi-autonomous driving scenario in which time spent on the road is minimized, while maximizing time spent driving autonomously. We propose two solutions to LPOMDPs--Lexicographic Value Iteration (LVI) and Lexicographic Point-Based Value Iteration (LPBVI), establishing convergence results and correctness within strong slack bounds. We test the algorithms using real-world road data provided by Open Street Map (OSM) within 10 major cities. Finally, we present GPU-based optimizations for point-based solvers, demonstrating that their application enables us to quickly solve vastly larger LPOMDPs and other variations of POMDPs. |
Wray, Kyle Hollins; Zilberstein, Shlomo A Parallel Point-Based POMDP Algorithm Leveraging GPUs Conference AAAI Fall Symposium on Sequential Decision Making for Intelligent Agents (SDMIA), Arlington, Virginia, 2015. @conference{SZ:WZfall15, We parallelize the Point-Based Value Iteration (PBVI) algorithm, which approximates the solution to Partially Observable Markov Decision Processes (POMDPs), using a Graph- ics Processing Unit (GPU). We detail additional optimizations, such as leveraging the bounded size of non-zero values over all belief point vectors, usable by serial and parallel algorithms. We compare serial (CPU) and parallel (GPU) implementations on 10 distinct problem domains, and demonstrate that our approach provides an order of magnitude improvement. |
Amato, Christopher; Bernstein, Daniel S; Zilberstein, Shlomo Optimizing Fixed-Size Stochastic Controllers for POMDPs and Decentralized POMDPs Journal Article In: Autonomous Agents and Multi-Agent Systems (JAAMAS), vol. 21, no. 3, pp. 293–320, 2010. @article{SZ:ABZjaamas10, Coordination of distributed agents is required for problems arising in many areas, including multi-robot systems, networking and e-commerce. As a formal framework for such problems, we use the decentralized partially observable Markov decision process (DEC-POMDP). Though much work has been done on optimal dynamic programming algorithms for the single-agent version of the problem, optimal algorithms for the multiagent case have been elusive. The main contribution of this paper is an optimal policy iteration algorithm for solving DEC-POMDPs. The algorithm uses stochastic finite-state controllers to represent policies. The solution can include a correlation device, which allows agents to correlate their actions without communicating. This approach alternates between expanding the controller and performing value-preserving transformations, which modify the controller without sacrificing value. We present two Efficient value-preserving transformations: one can reduce the size of the controller and the other can improve its value while keeping the size fixed. Empirical results demonstrate the usefulness of value-preserving transformations in increasing value while keeping controller size to a minimum. To broaden the applicability of the approach, we also present a heuristic version of the policy iteration algorithm, which sacrifices convergence to optimality. This algorithm further reduces the size of the controllers at each step by assuming that probability distributions over the other agents' actions are known. While this assumption may not hold in general, it helps produce higher quality solutions in our test problems. |
Amato, Christopher; Bonet, Blai; Zilberstein, Shlomo Finite-State Controllers Based on Mealy Machines for Centralized and Decentralized POMDPs Conference Proceedings of the 24th Conference on Artificial Intelligence (AAAI), Atlanta, Georgia, 2010. @conference{SZ:ABZaaai10, Existing controller-based approaches for centralized and decentralized POMDPs are based on automata with output known as Moore machines. In this paper, we show that several advantages can be gained by utilizing another type of automata, the Mealy machine. Mealy machines are more powerful than Moore machines, provide a richer structure that can be exploited by solution methods, and can be easily incorporated into current controller-based approaches. To demonstrate this, we adapted some existing controller-based algorithms to use Mealy machines and obtained results on a set of benchmark domains. The Mealy-based approach always outperformed the Moore-based approach and often outperformed the state-of-the-art algorithms for both centralized and decentralized POMDPs. These findings provide fresh and general insights for the improvement of existing algorithms and the development of new ones. |
Amato, Christopher; Bernstein, Daniel S; Zilberstein, Shlomo Solving POMDPs Using Quadratically Constrained Linear Programs Conference Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI), Hyderabad, India, 2007. @conference{SZ:ABZijcai07, Developing scalable algorithms for solving partially observable Markov decision processes (POMDPs) is an important challenge. One approach that effectively addresses the intractable memory requirements of POMDP algorithms is based on representing POMDP policies as finite-state controllers. In this paper, we illustrate some fundamental disadvantages of existing techniques that use controllers. We then propose a new approach that formulates the problem as a quadratically constrained linear program (QCLP), which defines an optimal controller of a desired size. This representation allows a wide range of powerful nonlinear programming algorithms to be used to solve POMDPs. Although QCLP optimization techniques guarantee only local optimality, the results we obtain using an existing optimization method show significant solution improvement over the state-of-the-art techniques. The results open up promising research directions for solving large POMDPs using nonlinear programming methods. |
Feng, Zhengzhu; Zilberstein, Shlomo Efficient Maximization in Solving POMDPs Conference Proceedings of the 20th National Conference on Artificial Intelligence (AAAI), Pittsburgh, Pennsylvania, 2005. @conference{SZ:FZaaai05, We present a simple, yet effective improvement to the dynamic programming algorithm for solving partially observable Markov decision processes. The technique targets the vector pruning operation during the maximization step, a key source of complexity in POMDP algorithms. We identify two types of structures in the belief space and exploit them to reduce significantly the number of constraints in the linear programs used for pruning. The benefits of the new technique are evaluated both analytically and experimentally, showing that it can lead to significant performance improvement. The results open up new research opportunities to enhance the performance and scalability of several POMDP algorithms. |
Feng, Zhengzhu; Zilberstein, Shlomo Region-Based Incremental Pruning for POMDPs Conference Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence (UAI), Banff, Canada, 2004. @conference{SZ:FZuai04, We present a major improvement to the incremental pruning algorithm for solving partially observable Markov decision processes. Our technique targets the cross-sum step of the dynamic programming (DP) update, a key source of complexity in POMDP algorithms. Instead of reasoning about the whole belief space when pruning the cross-sums, our algorithm divides the belief space into smaller regions and performs independent pruning in each region. We evaluate the benefits of the new technique both analytically and experimentally, and show that it produces very significant performance gains. The results contribute to the scalability of POMDP algorithms to domains that cannot be handled by the best existing techniques. |
Multiagent Planning and DEC-POMDPs
Choudhury, Moumita; Saisubramanian, Sandhya; Zhang, Hao; Zilberstein, Shlomo Minimizing Negative Side Effects in Cooperative Multi-Agent Systems Using Distributed Coordination Conference Proceedings of the The 23rd International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Auckland, New Zealand, 2024. @conference{SZ:CSZZaamas24, Autonomous agents in real-world environments may encounter undesirable outcomes or negative side effects (NSEs) when working collaboratively alongside other agents. We frame the challenge of minimizing NSEs in a multi-agent setting as a lexicographic decentralized Markov decision process in which we assume independence of rewards and transitions with respect to the primary assigned tasks, but allowing negative side effects to create a form of dependence among the agents. We present a lexicographic Q-learning approach to mitigate the NSEs using human feedback models while maintaining near-optimality with respect to the assigned tasks–up to some given slack. Our empirical evaluation across two domains demonstrates that our collaborative approach effectively mitigates NSEs, outperforming non-collaborative methods. |
Choudhury, Moumita; Saisubramanian, Sandhya; Zhang, Hao; Zilberstein, Shlomo Minimizing Negative Side Effects in Cooperative Multi-Agent Systems Using Distributed Coordination Conference Proceedings of the The 37th International FLAIRS Conference, Miramar Beach, Florida, 2024. @conference{SZ:CSZZflairs24, Autonomous agents operating in real-world environments frequently encounter undesirable outcomes or negative side effects (NSEs) when working collaboratively alongside other agents. Even when agents can execute their primary task optimally when operating in isolation, their training may not account for potential negative interactions that arise in the presence of other agents. We frame the challenge of minimizing NSEs as a lexicographic decentralized Markov decision process in which we assume independence of rewards and transitions with respect to the primary assigned tasks, but recognize that addressing negative side effects creates a form of dependence among the agents. We present a lexicographic Q-learning approach to mitigate the NSEs using human feedback models while maintaining near-optimality with respect to the assigned tasks–up to some given slack. Our empirical evaluation across two domains demonstrates that our collaborative approach effectively mitigates NSEs, outperforming non-collaborative methods. |
Mahmud, Saaduddin; Nashed, Samer B.; Goldman, Claudia V.; Zilberstein, Shlomo Estimating Causal Responsibility for Explaining Autonomous Behavior Book Section In: Calvaresi, Davide (Ed.): International Workshop on Explainable and Transparent AI and Multi-Agent Systems (EXTRAAMAS), pp. 78–94, Springer, 2023. @incollection{SZ:MNGZextraamas23, There has been growing interest in causal explanations of stochastic, sequential decision-making systems. Structural causal models and causal reasoning offer several theoretical benefits when exact inference can be applied. Furthermore, users overwhelmingly prefer the resulting causal explanations over other state-of-the-art systems. In this work, we focus on one such method, MeanRESP, and its approximate versions that drastically reduce compute load and assign a responsibility score to each variable, which helps identify smaller sets of causes to be used as explanations. However, this method, and its approximate versions in particular, lack deeper theoretical analysis and broader empirical tests. To address these shortcomings, we provide three primary contributions. First, we offer several theoretical insights on the sample complexity and error rate of approximate MeanRESP. Second, we discuss several automated metrics for comparing explanations generated from approximate methods to those generated via exact methods. While we recognize the significance of user studies as the gold standard for evaluating explanations, our aim is to leverage the proposed metrics to systematically compare explanation-generation methods along important quantitative dimensions. Finally, we provide a more detailed discussion of MeanRESP and how its output under different definitions of responsibility compares to existing widely adopted methods that use Shapley values. |
Parr, Shane; Khatri, Ishan; Svegliato, Justin; Zilberstein, Shlomo Agent-Aware State Estimation in Autonomous Vehicles Conference Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 2021. @conference{SZ:PKSZiros21, Autonomous systems often operate in environments where the behavior of multiple agents is coordinated by a shared global state. Reliable estimation of the global state is thus critical for successfully operating in a multi-agent setting. We introduce agent-aware state estimation--a framework for calculating indirect estimations of state given observations of the behavior of other agents in the environment. We also introduce transition-independent agent-aware state estimation--a tractable class of agent-aware state estimation--and show that it allows the speed of inference to scale linearly with the number of agents in the environment. As an example, we model traffic light classification in instances of complete loss of direct observation. By taking into account observations of vehicular behavior from multiple directions of traffic, our approach exhibits accuracy higher than that of existing traffic light-only HMM methods on a real-world autonomous vehicle data set under a variety of simulated occlusion scenarios. |
Wu, Feng; Zilberstein, Shlomo; Jennings, Nicholas R Multi-Agent Planning with High-Level Human Guidance Conference Proceedings of Principles and Practice of Multi-Agent Systems (PRIMA), 2020. @conference{SZ:WZJprima20, Planning and coordination of multiple agents in the presence of uncertainty and noisy sensors is extremely hard. A human operator who observes a multi-agent team can provide valuable guidance to the team based on her superior ability to interpret observations and assess the overall situation. We propose an extension of decentralized POMDPs that allows such human guidance to be factored into the planning and execution processes. Human guidance in our framework consists of intuitive high-level commands that the agents must translate into a suitable joint plan that is sensitive to what they know from local observations. The result is a framework that allows multi-agent systems to benefit from the complex strategic thinking of a human supervising them. We evaluate this approach on several common benchmark problems and show that it can lead to dramatic improvement in performance. |
Wu, Feng; Zilberstein, Shlomo; Jennings, Nicholas R Stochastic Multi-agent Planning with Partial State Models Conference Proceedings of the First International Conference on Distributed Artificial Intelligence (DAI), Beijing, China, 2019. @conference{SZ:WZJdai19, People who observe a multi-agent team can often provide valuable information to the agents based on their superior cognitive abilities to interpret sequences of observations and assess the overall situation. The knowledge they possess is often difficult to be fully represented using a formal model such as DEC-POMDP. To deal with this, we propose an extension of the DEC-POMDP that allows states to be partially specified and benefit from expert knowledge, while preserving the partial observability and decentralized operation of the agents. In particular, we present an algorithm for computing policies based on history samples that include human labeled data in the form of reward reshaping. We also consider ways to minimize the burden on human experts during the labeling phase. The results offer the first approach to incorporating human knowledge in such complex multi-agent settings. We demonstrate the benefits of our approach using a disaster recovery scenario, comparing it to several baseline approaches. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Privacy-Preserving Policy Iteration for Decentralized POMDPs Conference Proceedings of the 32nd Conference on Artificial Intelligence (AAAI), New Orleans, Louisiana, 2018. @conference{SZ:WZCaaai18, We propose the first privacy-preserving approach to address the privacy issues that arise in multi-agent planning problems modeled as a Dec-POMDP. Our solution is a distributed message-passing algorithm based on trials, where the agents' policies are optimized using the cross-entropy method. In our algorithm, the agents' private information is protected using a public-key homomorphic cryptosystem. We prove the correctness of our algorithm and analyze its complexity in terms of message passing and encryption/decryption operations. Furthermore, we analyze several privacy aspects of our algorithm and show that it can preserve the agent privacy of non-neighbors, model privacy, and decision privacy. Our experimental results on several common Dec-POMDP bench- mark problems confirm the effectiveness of our approach. |
Wray, Kyle Hollins; Kumar, Akshat; Zilberstein, Shlomo Integrated Cooperation and Competition in Multi-Agent Decision-Making Conference Proceedings of the 32nd Conference on Artificial Intelligence (AAAI), New Orleans, Louisiana, 2018. @conference{SZ:WKZaaai18, Observing that many real-world sequential decision problems are not purely cooperative or purely competitive, we propose a new model--cooperative-competitive process (CCP)--that can simultaneously encapsulate both cooperation and competition. First, we discuss how the CCP model bridges the gap between cooperative and competitive models. Next, we investigate a specific class of group-dominant CCPs, in which agents cooperate to achieve a common goal as their primary objective, while also pursuing individual goals as a secondary objective. We provide an approximate solution for this class of problems that leverages stochastic finite-state controllers. The model is grounded in two multi-robot meeting and box-pushing domains that are implemented in simulation and demonstrated on two real robots. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Multi-Agent Planning with Baseline Regret Minimization Conference Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI), 2017. @conference{SZ:WZCijcai17, We propose a novel baseline regret minimization algorithm for multi-agent planning problems modeled as finite-horizon decentralized POMDPs. It guarantees to produce a policy that is provably at least as good as a given baseline policy. We also propose an iterative belief generation algorithm to efficiently minimize the baseline regret, which only requires necessary iterations so as to converge to the policy with minimum baseline regret. Experimental results on common benchmark problems confirm the benefits of the algorithm compared with the state-of-the-art approaches. |
Kumar, Akshat; Mostafa, Hala; Zilberstein, Shlomo Dual Formulations for Optimizing Dec-POMDP Controllers Conference Proceedings of the 26th International Conference on Automated Planning and Scheduling (ICAPS), London, UK, 2016. @conference{SZ:KMZicaps16, Decentralized POMDP is an expressive model for multiagent planning. Finite-state controllers (FSCs)--often used to represent policies for infinite-horizon problems---offer a compact, simple-to-execute policy representation. We exploit novel connections between optimizing decentralized FSCs and the dual linear program for MDPs. Consequently, we describe a dual mixed integer linear program (MIP) for optimizing deterministic FSCs. We exploit the Dec-POMDP structure to devise a compact MIP and formulate constraints that result in policies executable in partially-observable decentralized settings. We show analytically that the dual formulation can also be exploited within the expectation maximization (EM) framework to optimize stochastic FSCs. The resulting EM algorithm can be implemented by solving a sequence of linear programs, without requiring expensive message passing over the Dec-POMDP DBN. We also present an efficient technique for policy improvement based on a weighted entropy measure. Compared with state-of-the-art FSC methods, our approach offers over an order-of-magnitude speedup, while producing similar or better solutions. |
Kumar, Akshat; Zilberstein, Shlomo; Toussaint, Marc Probabilistic Inference Techniques for Scalable Multiagent Decision Making Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 53, pp. 223–270, 2015. @article{SZ:KZTjair15, Decentralized POMDPs provide an expressive framework for multiagent sequential decision making. However, the complexity of these models -- NEXP-Complete even for two agents -- has limited their scalability. We present a promising new class of approximation algorithms by developing novel connections between multiagent planning and machine learning. We show how the multiagent planning problem can be reformulated as inference in a mixture of dynamic Bayesian networks (DBNs). This planning-as-inference approach paves the way for the application of efficient inference techniques in DBNs to multiagent decision making. To further improve scalability, we identify certain conditions that are sufficient to extend the approach to multiagent systems with dozens of agents. Specifically, we show that the necessary inference within the expectation-maximization framework can be decomposed into processes that often involve a small subset of agents, thereby facilitating scalability. We further show that a number of existing multiagent planning models satisfy these conditions. Experiments on large planning benchmarks confirm the benefits of our approach in terms of runtime and scalability with respect to existing techniques. |
Nguyen, Duc Thien; Yeoh, William; Lau, Hoong Chuin; Zilberstein, Shlomo; Zhang, Chongjie Decentralized Multi-Agent Reinforcement Learning in Average-Reward Dynamic DCOPs Conference Proceedings of the 28th Conference on Artificial Intelligence (AAAI), Quebec City, Canada, 2014. @conference{SZ:NYLZZaaai14, Researchers have introduced the Dynamic Distributed Constraint Optimization Problem (Dynamic DCOP) formulation to model dynamically changing multi-agent coordination problems, where a dynamic DCOP is a sequence of (static canonical) DCOPs, each partially different from the DCOP preceding it. Existing work typically assumes that the problem in each time step is decoupled from the problems in other time steps, which might not hold in some applications. Therefore, in this paper, we make the following contributions: (i) We introduce a new model, called Markovian Dynamic DCOPs (MD-DCOPs), where the DCOP in the next time step is a function of the value assignments in the current time step; (ii) We introduce two distributed reinforcement learning algorithms, the Distributed RVI Q-learning algorithm and the Distributed R-learning algorithm, that balance exploration and exploitation to solve MD-DCOPs in an online manner; and (iii) We empirically evaluate them against an existing multiarm bandit DCOP algorithm on dynamic DCOPs. |
Brafman, Ronen I; Shani, Guy; Zilberstein, Shlomo Qualitative Planning under Partial Observability in Multi-Agent Domains Conference Proceedings of the 27th Conference on Artificial Intelligence (AAAI), Bellevue, Washington, 2013. @conference{SZ:BSZaaai13, Decentralized POMDPs (Dec-POMDPs) provide a rich, attractive model for planning under uncertainty and partial observability in cooperative multi-agent domains with a growing body of research. In this paper we formulate a qualitative, propositional model for multi-agent planning under uncertainty with partial observability, which we call Qualitative Dec-POMDP (QDec-POMDP). We show that the worst-case complexity of planning in QDec-POMDPs is similar to that of Dec-POMDPs. Still, because the model is more "classical" in nature, it is more compact and easier to specify. Furthermore, it eases the adaptation of methods used in classical and contingent planning to solve problems that challenge current Dec-POMDPs solvers. In particular, in this paper we describe a method based on compilation to classical planning, which handles multi-agent planning problems significantly larger than those handled by current Dec-POMDP algorithms. |
Wu, Feng; Zilberstein, Shlomo; Jennings, Nicholas R Monte-Carlo Expectation Maximization for Decentralized POMDPs Conference Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI), Beijing, China, 2013. @conference{SZ:WZJijcai13, We address two significant drawbacks of state-of-the-art solvers of decentralized POMDPs (DEC-POMDPs): the reliance on complete knowledge of the model and limited scalability as the complexity of the domain grows. We extend a recently proposed approach for solving DEC-POMDPs via a reduction to the maximum likelihood problem, which in turn can be solved using EM. We introduce a model-free version of this approach that employs Monte-Carlo EM (MCEM). While a naive implementation of MCEM is inadequate in multi-agent settings, we introduce several improvements in sampling that produce high-quality results on a variety of DEC-POMDP benchmarks, including large problems with thousands of agents. |
Yeoh, William; Kumar, Akshat; Zilberstein, Shlomo Automated Generation of Interaction Graphs for Value-Factored Dec-POMDPs Conference Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI), Beijing, China, 2013. @conference{SZ:YKZijcai13, The Decentralized Partially Observable Markov Decision Process (Dec-POMDP) is a powerful model for multiagent planning under uncertainty, but its applicability is hindered by its high complexity -- solving Dec-POMDPs optimally is NEXP-hard. Recently, Kumar et al. introduced the Value Factorization (VF) framework, which exploits decomposable value functions that can be factored into subfunctions. This framework has been shown to be a generalization of several models that leverage sparse agent interactions such as TI-Dec-MDPs, ND-POMDPs and TD-POMDPs. Existing algorithms for these models assume that the interaction graph of the problem is given. In this paper, we introduce three algorithms to automatically generate interaction graphs for models within the VF framework and establish lower and upper bounds on the expected reward of an optimal joint policy. We illustrate experimentally the benefits of these techniques for sensor placement in a decentralized tracking application. |
Durfee, Edmund; Zilberstein, Shlomo Multiagent Planning, Control, and Execution Book Section In: Weiss, G (Ed.): Multiagent Systems, Second Edition, pp. 485–546, MIT Press, Cambridge, MA, USA, 2013. @incollection{SZ:DZmultiagent13, |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Online Planning for Multi-Agent Systems with Bounded Communication Journal Article In: Artificial Intelligence (AIJ), vol. 175, no. 2, pp. 487–511, 2011. @article{SZ:WZCaij11, We propose an online algorithm for planning under uncertainty in multi-agent settings modeled as DEC-POMDPs. The algorithm helps overcome the high computational complexity of solving such problems offline. The key challenges in decentralized operation are to maintain coordinated behavior with little or no communication and, when communication is allowed, to optimize value with minimal communication. The algorithm addresses these challenges by generating identical conditional plans based on common knowledge and communicating only when history inconsistency is detected, allowing communication to be postponed when necessary. To be suitable for online operation, the algorithm computes good local policies using a new and fast local search method implemented using linear programming. Moreover, it bounds the amount of memory used at each step and can be applied to problems with arbitrary horizons. The experimental results confirm that the algorithm can solve problems that are too large for the best existing offline planning algorithms and it outperforms the best online method, producing much higher value with much less communication in most cases. The algorithm also proves to be effective when the communication channel is imperfect (periodically unavailable). These results contribute to the scalability of decision-theoretic planning in multi-agent settings. |
Kumar, Akshat; Zilberstein, Shlomo Message-Passing Algorithms for Large Structured Decentralized POMDPs Conference Proceedings of the 10th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Taipei, Taiwan, 2011. @conference{SZ:KZaamas11, Anytime algorithms allow a system to trade solution quality for computation time. In previous work, monitoring techniques have been developed to allow agents to stop the computation at the "right" time so as to optimize a given time-dependent utility function. However, these results apply only to the single-agent case. In this paper we analyze the problems that arise when several agents solve components of a larger problem, each using an anytime algorithm. Monitoring in this case is more challenging as each agent is uncertain about the progress made so far by the others. We develop a formal framework for decentralized monitoring, establish the complexity of several interesting variants of the problem, and propose solution techniques for each one. Finally, we show that the framework can be applied to decentralized flow and planning problems. |
Kumar, Akshat; Zilberstein, Shlomo; Toussaint, Marc Scalable Multiagent Planning Using Probabilistic Inference Conference Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI), Barcelona, Spain, 2011. @conference{SZ:KZTijcai11, Multiagent planning has seen much progress with the development of formal models such as Dec-POMDPs. However, the complexity of these models -- NEXP-Complete even for two agents -- has limited scalability. We identify certain mild conditions that are sufficient to make multiagent planning amenable to a scalable approximation w.r.t. the number of agents. This is achieved by constructing a graphical model in which likelihood maximization is equivalent to plan optimization. Using the Expectation-Maximization framework for likelihood maximization, we show that the necessary inference can be decomposed into processes that often involve a small subset of agents, thereby facilitating scalability. We derive a global update rule that combines these local inferences to monotonically increase the overall solution quality. Experiments on a large multiagent planning benchmark confirm the benefits of the new approach in terms of runtime and scalability. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Online Planning for Ad Hoc Autonomous Agent Teams Conference Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI), Barcelona, Spain, 2011. @conference{SZ:WZCijcai11, We propose a novel online planning algorithm for ad hoc team settings -- challenging situations in which an agent must collaborate with unknown teammates without prior coordination. Our approach is based on constructing and solving a series of stage games, and then using biased adaptive play to choose actions. The utility function in each stage game is estimated via Monte-Carlo tree search using the UCT algorithm. We establish analytically the convergence of the algorithm and show that it performs well in a variety of ad hoc team domains. |
Amato, Christopher; Bernstein, Daniel S; Zilberstein, Shlomo Optimizing Fixed-Size Stochastic Controllers for POMDPs and Decentralized POMDPs Journal Article In: Autonomous Agents and Multi-Agent Systems (JAAMAS), vol. 21, no. 3, pp. 293–320, 2010. @article{SZ:ABZjaamas10, Coordination of distributed agents is required for problems arising in many areas, including multi-robot systems, networking and e-commerce. As a formal framework for such problems, we use the decentralized partially observable Markov decision process (DEC-POMDP). Though much work has been done on optimal dynamic programming algorithms for the single-agent version of the problem, optimal algorithms for the multiagent case have been elusive. The main contribution of this paper is an optimal policy iteration algorithm for solving DEC-POMDPs. The algorithm uses stochastic finite-state controllers to represent policies. The solution can include a correlation device, which allows agents to correlate their actions without communicating. This approach alternates between expanding the controller and performing value-preserving transformations, which modify the controller without sacrificing value. We present two Efficient value-preserving transformations: one can reduce the size of the controller and the other can improve its value while keeping the size fixed. Empirical results demonstrate the usefulness of value-preserving transformations in increasing value while keeping controller size to a minimum. To broaden the applicability of the approach, we also present a heuristic version of the policy iteration algorithm, which sacrifices convergence to optimality. This algorithm further reduces the size of the controllers at each step by assuming that probability distributions over the other agents' actions are known. While this assumption may not hold in general, it helps produce higher quality solutions in our test problems. |
Kumar, Akshat; Zilberstein, Shlomo Point-Based Backup for Decentralized POMDPs: Complexity and New Algorithms Conference Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Toronto, Canada, 2010. @conference{SZ:KZaamas10, Decentralized POMDPs provide an expressive framework for sequential multi-agent decision making. Despite their high complexity, there has been significant progress in scaling up existing algorithms, largely due to the use of point-based methods. Performing point-based backup is a fundamental operation in state-of-the-art algorithms. We show that even a single backup step in the multi-agent setting is NP-Complete. Despite this negative worst-case result, we present an efficient and scalable optimal algorithm as well as a principled approximation scheme. The optimal algorithm exploits recent advances in the weighted CSP literature to overcome the complexity of the backup operation. The polytime approximation scheme provides a constant factor approximation guarantee based on the number of belief points. In experiments on standard domains, the optimal approach provides significant speedup (up to 2 orders of magnitude) over the previous best optimal algorithm and is able to increase the number of belief points by more than a factor of 3. The approximation scheme also works well in practice, providing near-optimal solutions to the backup problem. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Point-Based Policy Generation for Decentralized POMDPs Conference Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Toronto, Canada, 2010. @conference{SZ:WZCaamas10, Memory-bounded techniques have shown great promise in solving complex multi-agent planning problems modeled as DEC-POMDPs. Much of the performance gains can be attributed to pruning techniques that alleviate the complexity of the exhaustive backup step of the original MBDP algorithm. Despite these improvements, state-of-the-art algorithms can still handle a relative small pool of candidate policies, which limits the quality of the solution in some benchmark problems. We present a new algorithm, Point-Based Policy Generation, which avoids altogether searching the entire joint policy space. The key observation is that the best joint policy for each reachable belief state can be constructed directly, instead of producing first a large set of candidates. We also provide an efficient approximate implementation of this operation. The experimental results show that our solution technique improves the performance significantly in terms of both runtime and solution quality. |
Kumar, Akshat; Zilberstein, Shlomo Anytime Planning for Decentralized POMDPs using Expectation Maximization Conference Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI), Catalina Island, California, 2010. @conference{SZ:KZuai10, Decentralized POMDPs provide an expressive framework for multi-agent sequential decision making. While finite-horizon DEC-POMDPs have enjoyed significant success, progress remains slow for the infinite-horizon case mainly due to the inherent complexity of optimizing stochastic controllers representing agent policies. We present a promising new class of algorithms for the infinite-horizon case, which recasts the optimization problem as inference in a mixture of DBNs. An attractive feature of this approach is the straightforward adoption of existing inference techniques in DBNs for solving DEC-POMDPs and supporting richer representations such as factored or continuous states and actions. We also derive the Expectation Maximization (EM) algorithm to optimize the joint policy represented as DBNs. Experiments on benchmark domains show that EM compares favorably against the state-of-the-art solvers. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Rollout Sampling Policy Iteration for Decentralized POMDPs Conference Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI), Catalina Island, California, 2010. @conference{SZ:WZCuai10, We present decentralized rollout sampling policy iteration (DecRSPI)--a new algorithm for multiagent decision problems formalized as DEC-POMDPs. DecRSPI is designed to improve scalability and tackle problems that lack an explicit model. The algorithm uses Monte-Carlo methods to generate a sample of reachable belief states. Then it computes a joint policy for each belief state based on the rollout estimations. A new policy representation allows us to represent solutions compactly. The key benefits of the algorithm are its linear time complexity over the number of agents, its bounded memory usage and good solution quality. It can solve larger problems that are intractable for existing planning algorithms. Experimental results confirm the effectiveness and scalability of the approach. |
Amato, Christopher; Bonet, Blai; Zilberstein, Shlomo Finite-State Controllers Based on Mealy Machines for Centralized and Decentralized POMDPs Conference Proceedings of the 24th Conference on Artificial Intelligence (AAAI), Atlanta, Georgia, 2010. @conference{SZ:ABZaaai10, Existing controller-based approaches for centralized and decentralized POMDPs are based on automata with output known as Moore machines. In this paper, we show that several advantages can be gained by utilizing another type of automata, the Mealy machine. Mealy machines are more powerful than Moore machines, provide a richer structure that can be exploited by solution methods, and can be easily incorporated into current controller-based approaches. To demonstrate this, we adapted some existing controller-based algorithms to use Mealy machines and obtained results on a set of benchmark domains. The Mealy-based approach always outperformed the Moore-based approach and often outperformed the state-of-the-art algorithms for both centralized and decentralized POMDPs. These findings provide fresh and general insights for the improvement of existing algorithms and the development of new ones. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Trial-Based Dynamic Programming for Multi-Agent Planning Conference Proceedings of the 24th Conference on Artificial Intelligence (AAAI), Atlanta, Georgia, 2010. @conference{SZ:WZCaaai10, Trial-based approaches offer an efficient way to solve single-agent MDPs and POMDPs. These approaches allow agents to focus their computations on regions of the environment they encounter during the trials, leading to significant computational savings. We present a novel trial-based dynamic programming (TBDP) algorithm for DEC-POMDPs that extends these benefits to multi-agent settings. The algorithm uses trial-based methods for both belief generation and policy evaluation. Policy improvement is implemented efficiently using linear programming and a sub-policy reuse technique that helps bound the amount of memory. The results show that TBDP can produce significant value improvements and is much faster than the best existing planning algorithms. |
Bernstein, Daniel S; Amato, Christopher; Hansen, Eric A; Zilberstein, Shlomo Policy Iteration for Decentralized Control of Markov Decision Processes Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 34, pp. 89–132, 2009. @article{SZ:BAHZjair09, Coordination of distributed agents is required for problems arising in many areas, including multi-robot systems, networking and e-commerce. As a formal framework for such problems, we use the decentralized partially observable Markov decision process (DEC-POMDP). Though much work has been done on optimal dynamic programming algorithms for the single-agent version of the problem, optimal algorithms for the multiagent case have been elusive. The main contribution of this paper is an optimal policy iteration algorithm for solving DEC-POMDPs. The algorithm uses stochastic finite-state controllers to represent policies. The solution can include a correlation device, which allows agents to correlate their actions without communicating. This approach alternates between expanding the controller and performing value-preserving transformations, which modify the controller without sacrificing value. We present two Efficient value-preserving transformations: one can reduce the size of the controller and the other can improve its value while keeping the size fixed. Empirical results demonstrate the usefulness of value-preserving transformations in increasing value while keeping controller size to a minimum. To broaden the applicability of the approach, we also present a heuristic version of the policy iteration algorithm, which sacrifices convergence to optimality. This algorithm further reduces the size of the controllers at each step by assuming that probability distributions over the other agents' actions are known. While this assumption may not hold in general, it helps produce higher quality solutions in our test problems. |
Petrik, Marek; Zilberstein, Shlomo A Bilinear Programming Approach for Multiagent Planning Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 35, pp. 235–274, 2009. @article{SZ:PZjair09, Multiagent planning and coordination problems are common and known to be computationally hard. We show that a wide range of two-agent problems can be formulated as bilinear programs. We present a successive approximation algorithm that significantly outperforms the coverage set algorithm, which is the state-of-the-art method for this class of multiagent problems. Because the algorithm is formulated for bilinear programs, it is more general and simpler to implement. The new algorithm can be terminated at any time and--unlike the coverage set algorithm--it facilitates the derivation of a useful online performance bound. It is also much more efficient, on average reducing the computation time of the optimal solution by about four orders of magnitude. Finally, we introduce an automatic dimensionality reduction method that improves the effectiveness of the algorithm, extending its applicability to new domains and providing a new way to analyze a subclass of bilinear programs. |
Becker, Raphen; Carlin, Alan; Lesser, Victor; Zilberstein, Shlomo Analyzing Myopic Approaches for Multi-Agent Communication Journal Article In: Computational Intelligence, vol. 25, no. 1, pp. 31–50, 2009. @article{SZ:BCLZci09, Choosing when to communicate is a fundamental problem in multi-agent systems. This problem becomes particularly challenging when communication is constrained and each agent has different partial information about the overall situation. We take a decision-theoretic approach to this problem that balances the benefits of communication against the costs. Although computing the exact value of communication is intractable, it can be estimated using a standard myopic assumption--that communication is only possible at the present time. We examine specific situations in which this assumption leads to poor performance and demonstrate an alternative approach that relaxes the assumption and improves performance. The results provide an effective method for value-driven communication policies in multi-agent systems. |
Amato, Christopher; Zilberstein, Shlomo Achieving Goals in Decentralized POMDPs Conference Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Budapest, Hungary, 2009. @conference{ASZ:Zaamas09, Coordination of multiple agents under uncertainty in the decentralized POMDP model is known to be NEXP-complete, even when the agents have a joint set of goals. Nevertheless, we show that the existence of goals can help develop effective planning algorithms. We examine an approach to model these problems as indefinite-horizon decentralized POMDPs, suitable for many practical problems that terminate after some unspecified number of steps. Our algorithm for solving these problems is optimal under some common assumptions--that terminal actions exist for each agent and rewards for non-terminal actions are negative. We also propose an infinite-horizon approximation method that allows us to relax these assumptions while maintaining goal conditions. An optimality bound is developed for this sample-based approach and experimental results show that it is able to exploit the goal structure effectively. Compared with the state-of-the-art, our approach can solve larger problems and produce significantly better solutions. |
Kumar, Akshat; Zilberstein, Shlomo Constraint-Based Dynamic Programming for Decentralized POMDPs with Structured Interactions Conference Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Budapest, Hungary, 2009. @conference{SZ:KZaamas09, Decentralized partially observable MDPs (DEC-POMDPs) provide a rich framework for modeling decision making by a team of agents. Despite rapid progress in this area, the limited scalability of solution techniques has restricted the applicability of the model. To overcome this computational barrier, research has focused on restricted classes of DEC-POMDPs, which are easier to solve yet rich enough to capture many practical problems. We present CBDP, an efficient and scalable point-based dynamic programming algorithm for one such model called ND-POMDP (Network Distributed POMDP). Specifically, CBDP provides magnitudes of speedup in the policy computation and generates better quality solution for all test instances. It has linear complexity in the number of agents and horizon length. Furthermore, the complexity per horizon for the examined class of problems is exponential only in a small parameter that depends upon the interaction among the agents, achieving significant scalability for large, loosely coupled multi-agent systems. The efficiency of CBDP lies in exploiting the structure of interactions using constraint networks. These results extend significantly the effectiveness of decision-theoretic planning in multi-agent settings. |
Kumar, Akshat; Zilberstein, Shlomo Dynamic Programming Approximations for Partially Observable Stochastic Games Conference Proceedings of the 22nd International FLAIRS Conference, Sanibel Island, Florida, 2009. @conference{SZ:KZflairs09, Partially observable stochastic games (POSGs) provide a rich mathematical framework for planning under uncertainty by a group of agents. However, this modeling advantage comes with a price, namely a high computational cost. Solving POSGs optimally quickly becomes intractable after a few decision cycles. Our main contribution is to provide bounded approximation techniques, which enable us to scale POSG algorithms by several orders of magnitude. We study both the POSG model and its cooperative counterpart, DEC-POMDP. Experiments on a number of problems confirm the scalability of our approach while still providing useful policies. |
Kumar, Akshat; Zilberstein, Shlomo Event-Detecting Multi-Agent MDPs: Complexity and Constant-Factor Approximation Conference Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI), Pasadena, California, 2009. @conference{SZ:KZijcai09, Planning under uncertainty for multiple agents has grown rapidly with the development of formal models such as multi-agent MDPs and decentralized MDPs. But despite their richness, the applicability of these models remains limited due to their computational complexity. We present the class of event-detecting multi-agent MDPs (eMMDPs), designed to detect multiple mobile targets by a team of sensor agents. We show that eMMDPs are NP-Hard and present a scalable 2-approximation algorithm for solving them using matroid theory and constraint optimization. The complexity of the algorithm is linear in the state-space and number of agents, quadratic in the horizon, and exponential only in a small parameter that depends on the interaction among the agents. Despite the worst-case approximation ratio of 2, experimental results show that the algorithm produces near-optimal policies for a range of test problems. |
Amato, Christopher; Dibangoye, Jilles Steeve; Zilberstein, Shlomo Incremental Policy Generation for Finite-Horizon DEC-POMDPs Conference Proceedings of the 19th International Conference on Automated Planning and Scheduling (ICAPS), Thessaloniki, Greece, 2009. @conference{SZ:ADZicaps09, Decentralized partially observable MDPs (DEC-POMDPs) provide a rich framework for modeling decision making by a team of agents. Despite rapid progress in this area, the limited scalability of solution techniques has restricted the applicability of the model. To overcome this computational barrier, research has focused on restricted classes of DEC-POMDPs, which are easier to solve yet rich enough to capture many practical problems. We present CBDP, an efficient and scalable point-based dynamic programming algorithm for one such model called ND-POMDP (Network Distributed POMDP). Specifically, CBDP provides magnitudes of speedup in the policy computation and generates better quality solution for all test instances. It has linear complexity in the number of agents and horizon length. Furthermore, the complexity per horizon for the examined class of problems is exponential only in a small parameter that depends upon the interaction among the agents, achieving significant scalability for large, loosely coupled multi-agent systems. The efficiency of CBDP lies in exploiting the structure of interactions using constraint networks. These results extend significantly the effectiveness of decision-theoretic planning in multi-agent settings. |
Wu, Feng; Zilberstein, Shlomo; Chen, Xiaoping Multi-Agent Online Planning with Communication Conference Proceedings of the 19th International Conference on Automated Planning and Scheduling (ICAPS), Thessaloniki, Greece, 2009. @conference{SZ:WZCicaps09, We propose an online algorithm for planning under uncertainty in multi-agent settings modeled as DEC-POMDPs. The algorithm helps overcome the high computational complexity of solving such problems off-line. The key challenge is to produce coordinated behavior using little or no communication. When communication is allowed but constrained, the challenge is to produce high value with minimal communication. The algorithm addresses these challenges by communicating only when history inconsistency is detected, allowing communication to be postponed if necessary. Moreover, it bounds the memory usage at each step and can be applied to problems with arbitrary horizons. The experimental results confirm that the algorithm can solve problems that are too large for the best existing off-line planning algorithms and it outperforms the best online method, producing higher value with much less communication in most cases. |
Allen, Martin; Zilberstein, Shlomo Complexity of Decentralized Control: Special Cases Conference Proceedings of the 23rd Neural Information Processing Systems Conference (NIPS), Vancouver, British Columbia, Canada, 2009. @conference{SZ:AZnips09, The worst-case complexity of general decentralized POMDPs, which are equivalent to partially observable stochastic games (POSGs) is very high, both for the cooperative and competitive cases. Some reductions in complexity have been achieved by exploiting independence relations in some models. We show that these results are somewhat limited: when these independence assumptions are relaxed in very small ways, complexity returns to that of the general case. |
Goldman, Claudia V; Zilberstein, Shlomo Communication-Based Decomposition Mechanisms for Decentralized MDPs Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 32, pp. 169–202, 2008. @article{SZ:GZjair08, Multi-agent planning in stochastic environments can be framed formally as a decentralized Markov decision problem. Many real-life distributed problems that arise in manufacturing, multi-robot coordination and information gathering scenarios can be formalized using this framework. However, finding the optimal solution in the general case is hard, limiting the applicability of recently developed algorithms. This paper provides a practical approach for solving decentralized control problems when communication among the decision makers is possible, but costly. We develop the notion of communication-based mechanism that allows us to decompose a decentralized MDP into multiple single-agent problems. In this framework, referred to as decentralized semi-Markov decision process with direct communication (Dec-SMDP-Com), agents operate separately between communications. We show that finding an optimal mechanism is equivalent to solving optimally a Dec-SMDP-Com. We also provide a heuristic search algorithm that converges on the optimal decomposition. Restricting the decomposition to some specific types of local behaviors reduces significantly the complexity of planning. In particular, we present a polynomial-time algorithm for the case in which individual agents perform goal-oriented behaviors between communications. The paper concludes with an additional tractable algorithm that enables the introduction of human knowledge, thereby reducing the overall problem to finding the best time to communicate. Empirical results show that these approaches provide good approximate solutions. |
Seuken, Sven; Zilberstein, Shlomo Formal Models and Algorithms for Decentralized Decision Making under Uncertainty Journal Article In: Autonomous Agents and Multi-Agent Systems (JAAMAS), vol. 17, no. 2, pp. 190–250, 2008. @article{SZ:SZjaamas08, Multi-agent planning in stochastic environments can be framed formally as a decentralized Markov decision problem. Many real-life distributed problems that arise in manufacturing, multi-robot coordination and information gathering scenarios can be formalized using this framework. However, finding the optimal solution in the general case is hard, limiting the applicability of recently developed algorithms. This paper provides a practical approach for solving decentralized control problems when communication among the decision makers is possible, but costly. We develop the notion of communication-based mechanism that allows us to decompose a decentralized MDP into multiple single-agent problems. In this framework, referred to as decentralized semi-Markov decision process with direct communication (Dec-SMDP-Com), agents operate separately between communications. We show that finding an optimal mechanism is equivalent to solving optimally a Dec-SMDP-Com. We also provide a heuristic search algorithm that converges on the optimal decomposition. Restricting the decomposition to some specific types of local behaviors reduces significantly the complexity of planning. In particular, we present a polynomial-time algorithm for the case in which individual agents perform goal-oriented behaviors between communications. The paper concludes with an additional tractable algorithm that enables the introduction of human knowledge, thereby reducing the overall problem to finding the best time to communicate. Empirical results show that these approaches provide good approximate solutions. |
Petrik, Marek; Zilberstein, Shlomo A Successive Approximation Algorithm for Coordination Problems Conference Proceedings of the 10th International Symposium on Artificial Intelligence and Mathematics (ISAIM), Ft. Lauderdale, Florida, 2008. @conference{SZ:PZisaim08, Developing scalable coordination algorithms for multi-agent systems is a hard computational challenge. One useful approach, demonstrated by the Coverage Set Algorithm (CSA), exploits structured interaction to produce significant computational gains. Empirically, CSA exhibits very good anytime performance, but an error bound on the results has not been established. We reformulate the algorithm and derive an online error bound for approximate solutions. Moreover, we propose an effective way to automatically reduce the complexity of the interaction. Our experiments show that this is a promising approach to solve a broad class of decentralized decision problems. The general formulation used by the algorithm makes it both easy to implement and widely applicable to a variety of other AI problems. |
Carlin, Alan; Zilberstein, Shlomo Value-Based Observation Compression for DEC-POMDPs Conference Proceedings of the 7th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Estoril, Portugal, 2008. @conference{SZ:CZaamas08, Representing agent policies compactly is essential for improving the scalability of multi-agent planning algorithms. In this paper, we focus on developing a pruning technique that allows us to merge certain observations within agent policies, while minimizing loss of value. This is particularly important for solving finite-horizon decentralized POMDPs, where agent policies are represented as trees, and where the size of policy trees grows exponentially with the number of observations. We introduce a value-based observation compression technique that prunes the least valuable observations while maintaining an error bound on the value lost as a result of pruning. We analyze the characteristics of this pruning strategy and show empirically that it is effective. Thus, we use compact policies to obtain significantly higher values compared with the best existing DEC-POMDP algorithm. |
Carlin, Alan; Zilberstein, Shlomo Observation Compression in DEC-POMDP Policy Trees Conference AAMAS Workshop on Multi-Agent Sequential Decision Making in Uncertain Domains (MSDM), Estoril, Portugal, 2008, (Best Paper Award). @conference{SZ:CZmsdm08, Representing agent policies compactly is essential for improving the scalability of multi-agent planning algorithms. In this paper, we focus on developing a pruning technique that allows us to merge certain observations from agent policies, while minimizing the loss of value. This is particularly important for solving finite-horizon decentralized POMDPs, where agent policies are represented as trees, and where the size of policy trees grows exponentially with the number of observations. We introduce a value-based observation compression technique that prunes the least valuable observations while maintaining an error bound on the value lost as a result of pruning. We analyze the characteristics of this pruning strategy and show empirically that it is effective. Thus, we use compact policies to obtain significantly higher values compared with the best existing DEC-POMDP algorithm. |
Amato, Christopher; Zilberstein, Shlomo What's Worth Memorizing: Attribute-based Planning for DEC-POMDPs Conference ICAPS Workshop on Multiagent Planning, Sydney, Australia, 2008. @conference{SZ:AZmasplan08, Current algorithms for decentralized partially observable Markov decision processes (DEC-POMDPs) require a large amount of memory to produce high quality plans. To combat this, existing methods optimize a set of finite-state controllers with an arbitrary amount of fixed memory. While this works well for some problems, in general, scalability and solution quality remain limited. As an alternative, we propose remembering some attributes that summarize key aspects of an agent's action and observation history. These attributes are often simple to determine, provide a well-motivated choice of controller size and focus the solution search on important components of agent histories. We show that for a range of DEC-POMDPs such attribute-based representation improves plan quality and scalability. |
Goldman, Claudia V; Allen, Martin; Zilberstein, Shlomo Learning to Communicate in a Decentralized Environment Journal Article In: Autonomous Agents and Multi-Agent Systems (JAAMAS), vol. 15, no. 1, pp. 47–90, 2007. @article{SZ:GAZjaamas07, Learning to communicate is an emerging challenge in AI research. It is known that agents interacting in decentralized, stochastic environments can benefit from exchanging information. Multi-agent planning generally assumes that agents share a common means of communication; however, in building robust distributed systems it is important to address potential miscoordination resulting from misinterpretation of messages exchanged. This paper lays foundations for studying this problem, examining its properties analytically and empirically in a decision-theoretic context. We establish a formal framework for the problem, and identify a collection of necessary and sufficient properties for decision problems that allow agents to employ probabilistic updating schemes in order to learn how to interpret what others are communicating. Solving the problem optimally is often intractable, but our approach enables agents using different languages to converge upon coordination over time. Our experimental work establishes how these methods perform when applied to problems of varying complexity. |
Seuken, Sven; Zilberstein, Shlomo Memory-Bounded Dynamic Programming for DEC-POMDPs Conference Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI), Hyderabad, India, 2007. @conference{SZ:SZijcai07, Decentralized decision making under uncertainty has been shown to be intractable when each agent has different partial information about the domain. Thus, improving the applicability and scalability of planning algorithms is an important challenge. We present the first memory-bounded dynamic programming algorithm for finite-horizon decentralized POMDPs. A set of heuristics is used to identify relevant points of the infinitely large belief space. Using these belief points, the algorithm successively selects the best joint policies for each horizon. The algorithm is extremely efficient, having linear time and space complexity with respect to the horizon length. Experimental results show that it can handle horizons that are multiple orders of magnitude larger than what was previously possible, while achieving the same or better solution quality. These results significantly increase the applicability of decentralized decision-making techniques. |
Amato, Christopher; Bernstein, Daniel S; Zilberstein, Shlomo Optimizing Memory-Bounded Controllers for Decentralized POMDPs Conference Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (UAI), Vancouver, British Columbia, 2007. @conference{SZ:ABZuai07, We present a memory-bounded optimization approach for solving infinite-horizon decentralized POMDPs. Policies for each agent are represented by stochastic finite state controllers. We formulate the problem of optimizing these policies as a nonlinear program, leveraging powerful existing nonlinear optimization techniques for solving the problem. While existing solvers only guarantee locally optimal solutions, we show that our formulation produces higher quality controllers than the state-of-the-art approach. We also incorporate a shared source of randomness in the form of a correlation device to further increase solution quality with only a limited increase in space and time. Our experimental results show that nonlinear optimization can be used to provide high quality, concise solutions to decentralized decision problems under uncertainty. |
Seuken, Sven; Zilberstein, Shlomo Improved Memory-Bounded Dynamic Programming for Decentralized POMDPs Conference Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (UAI), Vancouver, British Columbia, 2007. @conference{SZ:SZuai07, Memory-Bounded Dynamic Programming (MBDP) has proved extremely effective in solving decentralized POMDPs with large horizons. We generalize the algorithm and improve its scalability by reducing the complexity with respect to the number of observations from exponential to polynomial. We derive error bounds on solution quality with respect to this new approximation and analyze the convergence behavior. To evaluate the effectiveness of the improvements, we introduce a new, larger benchmark problem. Experimental results show that despite the high complexity of decentralized POMDPs, scalable solution techniques such as MBDP perform surprisingly well. |
Allen, Martin; Zilberstein, Shlomo Agent Influence as a Predictor of Difficulty for Decentralized Problem-Solving Conference Proceedings of the 22nd Conference on Artificial Intelligence (AAAI), Vancouver, British Columbia, 2007. @conference{SZ:AZaaai07, We study the effect of problem structure on the practical performance of optimal dynamic programming for decentralized decision problems. It is shown that restricting agent influence over problem dynamics can make the problem easier to solve. Experimental results establish that agent influence correlates with problem difficulty: as the gap between the influence of different agents grows, problems tend to become much easier to solve. The measure thus provides a general-purpose, automatic characterization of decentralized problems, identifying those for which optimal methods are more or less likely to work. Such a measure is also of possible use as a heuristic in the design of algorithms that create task decompositions and control hierarchies in order to simplify multiagent problems. |
Petrik, Marek; Zilberstein, Shlomo Anytime Coordination Using Separable Bilinear Programs Conference Proceedings of the 22nd Conference on Artificial Intelligence (AAAI), Vancouver, British Columbia, 2007. @conference{SZ:PZaaai07, Developing scalable coordination algorithms for multi-agent systems is a hard computational challenge. One useful approach, demonstrated by the Coverage Set Algorithm (CSA), exploits structured interaction to produce significant computational gains. Empirically, CSA exhibits very good anytime performance, but an error bound on the results has not been established. We reformulate the algorithm and derive both online and offline error bounds for approximate solutions. Moreover, we propose an effective way to automatically reduce the complexity of the interaction. Our experiments show that this is a promising approach to solve a broad class of decentralized decision problems. The general formulation used by the algorithm makes it both easy to implement and widely applicable to a variety of other AI problems. |
Szer, Daniel; Charpillet, Francois; Zilberstein, Shlomo MAA*: A Heuristic Search Algorithm for Solving Decentralized POMDPs Conference Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence (UAI), Edinburgh, Scotland, 2005. @conference{SZ:SCZuai05, We present multi-agent A* (MAA*), the first complete and optimal heuristic search algorithm for solving decentralized partially-observable Markov decision problems (DEC- POMDPs) with finite horizon. The algorithm is suitable for computing optimal plans for a cooperative group of agents that operate in a stochastic environment such as multi-robot coordination, network traffic control, or distributed resource allocation. Solving such problems effectively is a major challenge in the area of planning under uncertainty. Our solution is based on a synthesis of classical heuristic search and decentralized control theory. Experimental results show that MAA* has significant advantages. We introduce an anytime variant of MAA* and conclude with a discussion of promising extensions such as an approach to solving infinite-horizon problems. |
Generalized Planning
Bhatia, Abhinav; Nashed, Samer B.; Zilberstein, Shlomo RL3: Boosting Meta Reinforcement Learning via RL inside RL2 Conference NeurIPS Workshop on Generalized Planning (GenPlan), New Orleans, Louisiana, 2023. @conference{SZ:BNZgenplan23, Meta reinforcement learning (meta-RL) methods such as RL2 have emerged as promising approaches for learning data-efficient RL algorithms tailored to a given task distribution. However, these RL algorithms struggle with long-horizon tasks and out-of-distribution tasks since they rely on recurrent neural networks to pro- cess the sequence of experiences instead of summarizing them into general RL components such as value functions. Moreover, even transformers have a practical limit to the length of histories they can efficiently reason about before training and inference costs become prohibitive. In contrast, traditional RL algorithms are data-inefficient since they do not leverage domain knowledge, but they do converge to an optimal policy as more data becomes available. In this paper, we propose RL3, a principled hybrid approach that combines traditional RL and meta-RL by incorporating task-specific action-values learned through traditional RL as an input to the meta-RL neural network. We show that RL3 earns greater cumulative reward on long-horizon and out-of-distribution tasks compared to RL2, while maintaining the efficiency of the latter in the short term. Experiments are conducted on both custom and benchmark discrete domains from the meta-RL literature that exhibit a range of short-term, long-term, and complex dependencies. |
Srivastava, Siddharth; Zilberstein, Shlomo; Gupta, Abhishek; Abbeel, Pieter; Russell, Stuart J Tractability of Planning with Loops Conference Proceedings of the 29th Conference on Artificial Intelligence (AAAI), Austin, Texas, 2015. @conference{SZ:SZGARaaai15, We create a unified framework for analyzing and synthesizing plans with loops for solving problems with non-deterministic numeric effects and a limited form of partial observability. Three different action models -- with deterministic, qualitative non-deterministic and Boolean non-deterministic semantics -- are handled using a single abstract representation. We establish the conditions under which the correctness and termination of solutions, represented as abstract policies, can be verified. We also examine the feasibility of learning abstract policies from examples. We demonstrate our techniques on several planning problems and show that they apply to challenging real-world tasks such as doing the laundry with a PR2 robot. These results resolve a number of open questions about planning with loops and facilitate the development of new algorithms and applications. |
Srivastava, Siddharth; Immerman, Neil; Zilberstein, Shlomo Applicability Conditions for Plans with Loops: Computability Results and Algorithms Journal Article In: Artificial Intelligence (AIJ), vol. 191, pp. 1–19, 2012. @article{SZ:SIZaij12, The utility of including loops in plans has been long recognized by the planning community. Loops in a plan help increase both its applicability and the compactness of its representation. However, progress in finding such plans has been limited largely due to lack of methods for reasoning about the correctness and safety properties of loops of actions. We present novel algorithms for determining the applicability and progress made by a general class of loops of actions. These methods can be used for directing the search for plans with loops towards greater applicability while guaranteeing termination, as well as in post-processing of computed plans to precisely characterize their applicability. Experimental results demonstrate the efficiency of these algorithms. We also discuss the factors which can make the problem of determining applicability conditions for plans with loops incomputable. |
Srivastava, Siddharth; Immerman, Neil; Zilberstein, Shlomo A New Representation and Associated Algorithms for Generalized Planning Journal Article In: Artificial Intelligence (AIJ), vol. 175, no. 2, pp. 615–647, 2011. @article{SZ:SIZaij11, Constructing plans that can handle multiple problem instances is a longstanding open problem in AI. We present a framework for generalized planning that captures the notion of algorithm-like plans and unifies various approaches developed for addressing this problem. Using this framework, and building on the TVLA system for static analysis of programs, we develop a novel approach for computing generalizations of classical plans by identifying sequences of actions that will make measurable progress when placed in a loop. In a wide class of problems that we characterize formally in the paper, these methods allow us to find generalized plans with loops for solving problem instances of unbounded sizes and also to determine the correctness and applicability of the computed generalized plans. We demonstrate the scope and scalability of the proposed approach on a wide range of planning problems. |
Srivastava, Siddharth; Immerman, Neil; Zilberstein, Shlomo; Zhang, Tianjiao Directed Search for Generalized Plans Using Classical Planners Conference Proceedings of the 21st International Conference on Automated Planning and Scheduling (ICAPS), Freiburg, Germany, 2011. @conference{SZ:SIZZicaps11, We consider the problem of finding generalized plans for situations where the number of objects may be unknown and unbounded during planning. The input is a domain specification, a goal condition, and a class of concrete problem instances or initial states to be solved, expressed in an abstract first-order representation. Starting with an empty generalized plan, our overall approach is to incrementally increase the applicability of the plan by identifying a problem instance that it cannot solve, invoking a classical planner to solve that problem, generalizing the obtained solution and merging it back into the generalized plan. The main contributions of this paper are methods for (a) generating and solving small problem instances not yet covered by an existing generalized plan, (b) translating between concrete classical plans and abstract plan representations, and (c) extending partial generalized plans and increasing their applicability. We analyze the theoretical properties of these methods, prove their correctness, and illustrate experimentally their scalability. The resulting hybrid approach shows that solving only a few, small, classical planning problems can be sufficient to produce a generalized plan that applies to infinitely many problems with unknown numbers of objects. |
Srivastava, Siddharth; Zilberstein, Shlomo; Immerman, Neil; Geffner, Hector Qualitative Numeric Planning Conference Proceedings of the 25th Conference on Artificial Intelligence (AAAI), San Francisco, California, 2011. @conference{SZ:SZIGaaai11, We consider a new class of planning problems involving a set of non-negative real variables, and a set of non-deterministic actions that increase or decrease the values of these variables by some arbitrary amount. The formulas specifying the initial state, goal state, or action preconditions can only assert whether certain variables are equal to zero or not. Assuming that the state of the variables is fully observable, we obtain two results. First, the solution to the problem can be expressed as a policy mapping qualitative states into actions, where a qualitative state includes a Boolean variable for each original variable, indicating whether its value is zero or not. Second, testing whether any such policy, that may express nested loops of actions, is a solution to the problem, can be determined in time that is polynomial in the qualitative state space, which is much smaller than the original infinite state space. We also report experimental results using a simple generate-and-test planner to illustrate these findings. |
Srivastava, Siddharth; Immerman, Neil; Zilberstein, Shlomo Termination and Correctness Analysis of Cyclic Control Conference Proceedings of the 25th Conference on Artificial Intelligence (AAAI Nectar Track), San Francisco, California, 2011. @conference{SZ:SIZaaai11, We consider a new class of planning problems involving a set of non-negative real variables, and a set of non-deterministic actions that increase or decrease the values of these variables by some arbitrary amount. The formulas specifying the initial state, goal state, or action preconditions can only assert whether certain variables are equal to zero or not. Assuming that the state of the variables is fully observable, we obtain two results. First, the solution to the problem can be expressed as a policy mapping qualitative states into actions, where a qualitative state includes a Boolean variable for each original variable, indicating whether its value is zero or not. Second, testing whether any such policy, that may express nested loops of actions, is a solution to the problem, can be determined in time that is polynomial in the qualitative state space, which is much smaller than the original infinite state space. We also report experimental results using a simple generate-and-test planner to illustrate these findings. |
Srivastava, Siddharth; Immerman, Neil; Zilberstein, Shlomo Computing Applicability Conditions for Plans with Loops Conference Proceedings of the 20th International Conference on Automated Planning and Scheduling (ICAPS), Toronto, Canada, 2010, (Best Paper Award). @conference{SZ:SIZicaps10, The utility of including loops in plans has been long recognized by the planning community. Loops in a plan help increase both its applicability and the compactness of representation. However, progress in finding such plans has been limited largely due to lack of methods for reasoning about the correctness and safety properties of loops of actions. We present novel algorithms for determining the applicability and progress made by a general class of loops of actions. These methods can be used for directing the search for plans with loops towards greater applicability while guaranteeing termination, as well as in post-processing of computed plans to precisely characterize their applicability. Experimental results demonstrate the efficiency of these algorithms. |
Srivastava, Siddharth; Immerman, Neil; Zilberstein, Shlomo Merging Example Plans into Generalized Plans for Non-deterministic Environments Conference Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Toronto, Canada, 2010. @conference{SIZaamas10, We present a new approach for finding generalized contingent plans with loops and branches in situations where there is uncertainty in state properties and object quantities, but lack of probabilistic information about these uncertainties. We use a state abstraction technique from static analysis of programs, which uses 3-valued logic to compactly represent belief states with unbounded numbers of objects. Our approach for finding plans is to incrementally generalize and merge input example plans which can be generated by classical planners. The expressiveness and scope of this approach are demonstrated using experimental results on common benchmark domains. |
Srivastava, Siddharth; Immerman, Neil; Zilberstein, Shlomo Abstract Planning with Unknown Object Quantities and Properties Conference Proceedings of the 8th Symposium on Abstraction, Reformulation, and Approximation (SARA), Lake Arrowhead, California, 2009. @conference{SZ:SIZsara09, State abstraction has been widely used for state aggregation in approaches to AI search and planning. In this paper we use a powerful abstraction technique from software model checking for representing collections of states with different object quantities and properties. We exploit this method to develop precise abstractions and action operators for use in AI. This enables us to find scalable, algorithm-like plans with branches and loops which can solve problems of unbounded sizes. We describe how this method of abstraction can be effectively used in AI, with compelling results from implementations of two planning algorithms. |
Srivastava, Siddharth; Immerman, Neil; Zilberstein, Shlomo Using Abstraction for Generalized Planning Conference Proceedings of the 10th International Symposium on Artificial Intelligence and Mathematics (ISAIM), Ft. Lauderdale, Florida, 2008. @conference{SZ:SIZisaim08, Given the complexity of planning, it is often beneficial to create plans that work for a wide class of problems. This facilitates reuse of existing plans for different instances of the same problem or even for other problems that are somehow similar. We present novel approaches for finding such plans through search and for learning them from examples. We use state representation and abstraction techniques originally developed for static analysis of programs. The generalized plans that we compute include loops and work for classes of problems having varying numbers of objects that must be manipulated to reach the goal. Our algorithm for learning generalized plans takes as its input an example plan for a certain problem instance and a finite 3-valued first-order structure representing a set of initial states from different problem instances. It learns a generalized plan along with a classification of the problem instances where it works. The algorithm for finding plans takes as input a similar 3-valued structure and a goal test. Its output is a set of generalized plans and conditions describing the problem instances for which they work. |
Srivastava, Siddharth; Immerman, Neil; Zilberstein, Shlomo Learning Generalized Plans Using Abstract Counting Conference Proceedings of the 23rd Conference on Artificial Intelligence (AAAI), Chicago, Illinois, 2008. @conference{SZ:SIZaaai08, Given the complexity of planning, it is often beneficial to create plans that work for a wide class of problems. This facilitates reuse of existing plans for different instances drawn from the same problem or from an infinite family of similar problems. We define a class of such planning problems called generalized planning problems and present a novel approach for transforming classical plans into generalized plans. These algorithm-like plans include loops and work for problem instances having varying numbers of objects that must be manipulated to reach the goal. Our approach takes as input a classical plan for a certain problem instance. It outputs a generalized plan along with a classification of the problem instances where it is guaranteed to work. We illustrate the utility of our approach through results of a working implementation on various practical examples. |
Introspective Autonomy
Wray, Kyle Hollins; Witwicki, Stefan; Zilberstein, Shlomo Belief State Determination for Real-Time Decision-Making Miscellaneous 2024, (US Patent 11,921,506). @misc{SZ:WWZpatent24c, Real-time decision-making for a vehicle using belief state determination is described. Operational environment data is received while the vehicle is traversing a vehicle transportation network, where the data includes data associated with an external object. An operational environment monitor establishes an observation that relates the object to a distinct vehicle operation scenario. A belief state model of the monitor computes a belief state for the observation directly from the operational environment data. The monitor provides the computed belief state to a decision component implementing a policy that maps a respective belief state for the object within the distinct vehicle operation scenario to a respective candidate vehicle control action. A candidate vehicle control action is received from the policy of the decision component, and a vehicle control action is selected for traversing the vehicle transportation from any available candidate vehicle control actions. |
Wray, Kyle Hollins; Witwicki, Stefan; Zilberstein, Shlomo Objective-Based Reasoning in Autonomous Vehicle Decision-Making Miscellaneous 2024, (US Patent 11,899,454). @misc{SZ:WWZpatent24b, An autonomous vehicle traverses a vehicle transportation network using a multi-objective policy based on a model for specific scenarios. The multi-objective policy includes a topographical map that shows a relationship between at least two objectives. The autonomous vehicle receives a candidate vehicle control action associated with each of the at least two objectives. The autonomous vehicle selects a vehicle control action based on a buffer value that is associated with the at least two objectives. The autonomous vehicle traverses a portion of the vehicle transportation network in accordance with the selected vehicle control action. |
Wray, Kyle Hollins; Witwicki, Stefan; Zilberstein, Shlomo Shared Autonomous Vehicle Operational Management Miscellaneous 2024, (US Patent 11,874,120). @misc{SZ:WWZpatent24a, Traversing, by an autonomous vehicle, a vehicle transportation network, may include identifying a distinct vehicle operational scenario, wherein traversing the vehicle transportation network includes traversing a portion of the vehicle transportation network that includes the distinct vehicle operational scenario, communicating shared scenario-specific operational control management data associated with the distinct vehicle operational scenario with an external shared scenario-specific operational control management system, operating a scenario-specific operational control evaluation module instance including an instance of a scenario-specific operational control evaluation model of the distinct vehicle operational scenario, and wherein operating the scenario-specific operational control evaluation module instance includes identifying a policy for the scenario-specific operational control evaluation model, receiving a candidate vehicle control action from the policy for the scenario-specific operational control evaluation model, and traversing a portion of the vehicle transportation network based on the candidate vehicle control action. |
Wray, Kyle Hollins; Witwicki, Stefan; Zilberstein, Shlomo; Bentahar, Omar; Jamgochian, Arec Explainability of Autonomous Vehicle Decision Making Miscellaneous 2023, (US Patent 11,714,971). @misc{SZ:WWZBJpatent23e, A processor is configured to execute instructions stored in a memory to identify distinct vehicle operational scenarios; instantiate decision components, where each of the decision components is an instance of a respective decision problem, and where the each of the decision components maintains a respective state describing the respective vehicle operational scenario; receive respective candidate vehicle control actions from the decision components; select an action from the respective candidate vehicle control actions, where the action is from a selected decision component of the decision components, and where the action is used to control the AV to traverse a portion of the vehicle transportation network; and generate an explanation as to why the action was selected, where the explanation includes respective descriptors of the action, the selected decision component, and a state factor of the respective state of the selected decision component. |
Wray, Kyle Hollins; Witwicki, Stefan; Zilberstein, Shlomo Autonomous Vehicle Operation with Explicit Occlusion Reasoning Miscellaneous 2023, (US Patent 11,702,070). @misc{SZ:WWZpatent23d, Autonomous vehicle operation with explicit occlusion reasoning may include traversing, by a vehicle, a vehicle transporation network. Traversing the vehicle transportation network can include receiving, from a sensor of the vehicle, sensor data for a portion of a vehicle operational environment, determining, using the sensor data, a visibility grid comprising coordinates forming an unobserved region within a defined distance from the vehicle, computing a probability of a presence of an external object within the unobserved region by comparing the visibility grid to a map (e.g., a high-definition map), and traversing a portion of the vehicle transportation network using the probability. An apparatus and a vehicle are also described. |
Wray, Kyle Hollins; Witwicki, Stefan; Zilberstein, Shlomo Risk Aware Executor with Action Set Recommendations Miscellaneous 2023, (US Patent 11,635,758). @misc{SZ:WWZpatent23c, A method for use in traversing a vehicle transportation network by an autonomous vehicle (AV) includes traversing, by the AV, the vehicle transportation network. Traversing the vehicle transportation network includes identifying a distinct vehicle operational scenario; instantiating a first decision component instance; receiving a first set of candidate vehicle control actions from the first decision component instance; selecting an action; and controlling the AV to traverse a portion of the vehicle transportation network based on the action. The first decision component instance is an instance of a first decision component modeling the distinct vehicle operational scenario. |
Basich, Connor; Svegliato, Justin; Wray, Kyle Hollins; Witwicki, Stefan; Biswas, Joydeep; Zilberstein, Shlomo Competence-Aware Systems Journal Article In: Artificial Intelligence (AIJ), iss. 316, pp. 103844, 2023. @article{SZ:BSWWBZaij23, Building autonomous systems for deployment in the open world has been a longstanding objective in both artificial intelligence and robotics. The open world, however, presents challenges that question some of the assumptions often made in contemporary AI models. Autonomous systems that operate in the open world face complex, non-stationary environments wherein enumerating all situations the system may face over the course of its deployment is intractable. Nevertheless, these systems are expected to operate safely and reliably for extended durations. Consequently, AI systems often rely on some degree of human assistance to mitigate risks while completing their tasks, and are hence better treated as semi-autonomous systems. In order to reduce unnecessary reliance on humans and optimize autonomy, we propose a novel introspective planning model—competence-aware systems (CAS)—that enables a semi-autonomous system to reason about its own competence and allowed level of autonomy by leveraging human feedback or assistance. A CAS learns to adjust its level of autonomy based on experience and interactions with a human authority so as to reduce improper reliance on the human and optimize the degree of autonomy it employs in any given circumstance. To handle situations in which the initial CAS model has insufficient state information to properly discriminate feedback received from humans, we introduce a methodology called iterative state space refinement that gradually increases the granularity of the state space online. The approach exploits information that exists in the standard CAS model and requires no additional input from the human. The result is an agent that can more confidently predict the correct feedback from the human authority in each level of autonomy, enabling it learn its competence in a larger portion of the state space. |
Wray, Kyle Hollins; Witwicki, Stefan; Zilberstein, Shlomo Learning Safety and Human-Centered Constraints in Autonomous Vehicles Miscellaneous 2023, (US Patent 11,613,269). @misc{SZ:WWZpatent23b, Traversing a vehicle transportation network includes operating a scenario-specific operational control evaluation module instance. The scenario-specific operational control evaluation module instance includes an instance of a scenario-specific operational control evaluation model of a distinct vehicle operational scenario. Operating the scenario-specific operational control evaluation module instance includes identifying a multi-objective policy for the scenario-specific operational control evaluation model. The multi-objective policy may include a relationship between at least two objectives. Traversing the vehicle transportation network includes receiving a candidate vehicle control action associated with each of the at least two objectives. Traversing the vehicle transportation network includes selecting a vehicle control action based on a buffer value. Traversing the vehicle transportation network includes performing the selected vehicle control action, determining a preference indicator for each objective, and updating the multi-objective policy. |
Wray, Kyle Hollins; Bentahar, Omar; Vagadia, Astha; Cesafsky, Laura; Jamgochian, Arec; Witwicki, Stefan; Baig, Najamuddin Mirza; Gyorfi, Julius S; Zilberstein, Shlomo; Sharma, Sparsh Explainability of Autonomous Vehicle Decision Making Miscellaneous 2023, (US Patent 11,577,746). @misc{SZ:WBVCJWpatent23a, A processor is configured to execute instructions stored in a memory to determine, in response to identifying vehicle operational scenarios of a scene, an action for controlling the AV, where the action is from a selected decision component that determined the action based on level of certainty associated with a state factor; generate an explanation as to why the action was selected, such that the explanation includes respective descriptors of the action, the selected decision component, and the state factor; and display the explanation in a graphical view that includes a first graphical indicator of a world object of the selected decision component, a second graphical indicator describing the state factor, and a third graphical indicator describing the action. |
Wray, Kyle; Witwicki, Stefan; Zilberstein, Shlomo; Pedersen, Liam Autonomous Vehicle Operational Management Including Operating a Partially Observable Markov Decision Process Model Instance Miscellaneous 2022, (US Patent 11,500,380). @misc{SZ:WWZPpatent22d, Autonomous vehicle operational management may include traversing, by an autonomous vehicle, a vehicle transportation network. Traversing the vehicle transportation network may include operating a scenario-specific operational control evaluation module instance, wherein the scenario-specific operational control evaluation module instance is an instance of a scenario-specific operational control evaluation module, wherein the scenario-specific operational control evaluation module implements a partially observable Markov decision process. Traversing the vehicle transportation network may include receiving a candidate vehicle control action from the scenario-specific operational control evaluation module instance, and traversing a portion of the vehicle transportation network based on the candidate vehicle control action. |
Basich, Connor; Wray, Kyle Hollins; Witwicki, Stefan; Zilberstein, Shlomo Introspective Competence Modeling for AV Decision Making Miscellaneous 2022, (US Patent 11,307,585). @misc{SZ:BWWZpatent22c, A first method includes detecting, based on sensor data, an environment state; selecting an action based on the environment state; determining an autonomy level associated with the environment state and the action; and performing the action according to the autonomy level. The autonomy level can be selected based at least on an autonomy model and a feedback model. A second method includes calculating, by solving an extended Stochastic Shortest Path (SSP) problem, a policy for solving a task. The policy can map environment states and autonomy levels to actions and autonomy levels. Calculating the policy can include generating plans that operate across multiple levels of autonomy. |
Wray, Kyle Hollins; Witwicki, Stefan; Zilberstein, Shlomo Multiple Objective Explanation and Control Interface Design Miscellaneous 2022, (US Patent 11,300,957). @misc{SZ:WWZpatent22b, A vehicle traversing a vehicle transportation network may use a scenario-specific operational control evaluation model instance. A multi-objective policy for the model is received, wherein the policy includes at least a first objective, a second objective, and a priority of the first objective relative to the second objective. A representation of the policy (e.g., the first objective, the second objective, and the priority) is generated using a user interface. Based on feedback to the user interface, a change to the multi-objective policy for the scenario-specific operational control evaluation model is received. The change is to the first objective, the second objective, the priority, of some combination thereof. Then, for determining a vehicle control action for traversing the vehicle transportation network, an updated multi-objective policy for the scenario-specific operational control evaluation model is generated to include the change to the policy. |
Rabiee, Sadegh; Basich, Connor; Wray, Kyle Hollins; Zilberstein, Shlomo; Biswas, Joydeep Competence-Aware Path Planning Via Introspective Perception Journal Article In: IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 3218–3225, 2022. @article{SZ:RBWZBlra22, Robots deployed in the real world over extended periods of time need to reason about unexpected failures, learn to predict them, and to proactively take actions to avoid future failures. Existing approaches for competence-aware planning are either model-based, requiring explicit enumeration of known failure sources, or purely statistical, using state- and location-specific failure statistics to infer competence. We instead propose a structured model-free approach to competence-aware planning by reasoning about plan execution failures due to errors in perception, without requiring a priori enumeration of failure sources or requiring location-specific failure statistics. We introduce competence-aware path planning via introspective perception (CPIP) , a Bayesian framework to iteratively learn and exploit task-level competence in novel deployment environments. CPIP factorizes the competence-aware planning problem into two components. First, perception errors are learned in a model-free and location-agnostic setting via introspective perception prior to deployment in novel environments. Second, during actual deployments, the prediction of task-level failures is learned in a context-aware setting. Experiments in a simulation show that the proposed CPIP approach outperforms the frequentist baseline in multiple mobile robot tasks, and is further validated via real robot experiments in environments with perceptually challenging obstacles and terrain. |
Ostafew, Christopher; Vagadia, Astha; Baig, Najamuddin; James, Viju; Witwicki, Stefan; Zilberstein, Shlomo Exception Situation Playback for Tele-Operators Miscellaneous 2022, (US Patent 11,215,987). @misc{SZ:OVBJWZpatent22a, Resolving an exception situation in autonomous driving includes receiving an assistance request to resolve the exception situation from an autonomous vehicle (AV); identifying a solution to the exception situation; forwarding the solution to a tele-operator; receiving a request for playback data from the tele-operator; receiving, from the AV, the playback data; and obtaining, from the tele-operator, a validated solution based on the tele-operator using the playback data. The playback data includes snapshots n_i of data related to autonomous driving stored at the AV at respective consecutive times t_i, for i= 1,... , N. |
Wray, Kyle Hollins; Witwicki, Stefan J; Zilberstein, Shlomo; Bentahar, Omar; Jamgochian, Arec Explainability of Autonomous Vehicle Decision Making Miscellaneous 2021, (US Patent App. 16/778,890). @misc{SZ:BWWZpatent21j, A processor is configured to execute instructions stored in a memory to identify distinct vehicle operational scenarios; instantiate decision components, where each of the decision components is an instance of a respective decision problem, and where the each of the decision components maintains a respective state describing the respective vehicle operational scenario; receive respective candidate vehicle control actions from the decision components; select an action from the respective candidate vehicle control actions, where the action is from a selected decision component of the decision components, and where the action is used to control the AV to traverse a portion of the vehicle transportation network; and generate an explanation as to why the action was selected, where the explanation includes respective descriptors of the action, the selected decision component, and a state factor of the respective state of the selected decision component. |
Wray, Kyle Hollins; Witwicki, Stefan J; Zilberstein, Shlomo Reinforcement and Model Learning for Vehicle Operation Miscellaneous 2021, (US Patent 11,027,751). @misc{SZ:BWWZpatent21f, Methods and vehicles may be configured to gain experience in the form of state-action and/or action-observation histories for an operational scenario as the vehicle traverses a vehicle transportation network. The histories may be incorporated into a model in the form of learning to improve the model over time. The learning may be used to improve integration with human behavior. Driver feedback may be used in the learning examples to improve future performance and to integrate with human behavior. The learning may be used to create customized scenario solutions. The learning may be used to transfer a learned solution and apply the learned solution to a similar scenario. |
Wray, Kyle Hollins; Witwicki, Stefan J; Zilberstein, Shlomo Risk Aware Executor with Action Set Recommendations Miscellaneous 2021, (US Patent App. 16/696,235). @misc{SZ:BWWZpatent21d, A method for use in traversing a vehicle transportation network by an autonomous vehicle (AV) includes traversing, by the AV, the vehicle transportation network. Traversing the vehicle transportation network includes identifying a distinct vehicle operational scenario; instantiating a first decision component instance; receiving a first set of candidate vehicle control actions from the first decision component instance; selecting an action; and controlling the AV to traverse a portion of the vehicle transportation network based on the action. The first decision component instance is an instance of a first decision component modeling the distinct vehicle operational scenario. |
Basich, Connor; Wray, Kyle Hollins; Witwicki, Stefan J; Zilberstein, Shlomo Introspective Competence Modeling for AV Decision Making Miscellaneous 2021, (US Patent App. 16/668,584). @misc{SZ:BWWZpatent21c, A first method includes detecting, based on sensor data, an environment state; selecting an action based on the environment state; determining an autonomy level associated with the environment state and the action; and performing the action according to the autonomy level. The autonomy level can be selected based at least on an autonomy model and a feedback model. A second method includes calculating, by solving an extended Stochastic Shortest Path (SSP) problem, a policy for solving a task. The policy can map environment states and autonomy levels to actions and autonomy levels. Calculating the policy can include generating plans that operate across multiple levels of autonomy. |
Rabiee, Sadegh; Basich, Connor; Wray, Kyle Hollins; Zilberstein, Shlomo; Biswas, Joydeep Competence-Aware Path Planning via Introspective Perception Journal Article In: CoRR, vol. abs/2109.13974, 2021. @article{SZ:SZarXiv21c, Robots deployed in the real world over extended periods of time need to reason about unexpected failures, learn to predict them, and to proactively take actions to avoid future failures. Existing approaches for competence-aware planning are either model-based, requiring explicit enumeration of known failure modes, or purely statistical, using state- and location-specific failure statistics to infer competence. We instead propose a structured model-free approach to competence-aware planning by reasoning about plan execution failures due to errors in perception, without requiring a-priori enumeration of failure modes or requiring location-specific failure statistics. We introduce competence-aware path planning via introspective perception (CPIP), a Bayesian framework to iteratively learn and exploit task-level competence in novel deployment environments. CPIP factorizes the competence-aware planning problem into two components. First, perception errors are learned in a model-free and location-agnostic setting via introspective perception prior to deployment in novel environments. Second, during actual deployments, the prediction of task-level failures is learned in a context-aware setting. Experiments in a simulation show that the proposed CPIP approach outperforms the frequentist baseline in multiple mobile robot tasks, and is further validated via real robot experiments in an environment with perceptually challenging obstacles and terrain. |
Wray, Kyle Hollins; Witwicki, Stefan J; Zilberstein, Shlomo Multiple Objective Explanation and Control Interface Design Miscellaneous 2021, (US Patent App. 16/727,038). @misc{SZ:BWWZpatent21h, A vehicle traversing a vehicle transportation network may use a scenario-specific operational control evaluation model instance. A multi-objective policy for the model is received, wherein the policy includes at least a first objective, a second objective, and a priority of the first objective relative to the second objective. A representation of the policy (e.g., the first objective, the second objective, and the priority) is generated using a user interface. Based on feedback to the user interface, a change to the multi-objective policy for the scenario-specific operational control evaluation model is received. The change is to the first objective, the second objective, the priority, of some combination thereof. Then, for determining a vehicle control action for traversing the vehicle transportation network, an updated multi-objective policy for the scenario-specific operational control evaluation model is generated to include the change to the policy. |
Wray, Kyle Hollins; Witwicki, Stefan J; Zilberstein, Shlomo Shared Autonomous Vehicle Operational Management Miscellaneous 2021, (US Patent App. 16/955,531). @misc{SZ:WWZpatent21b, Traversing, by an autonomous vehicle, a vehicle transportation network, may include identifying a distinct vehicle operational scenario, wherein traversing the vehicle transportation network includes traversing a portion of the vehicle transportation network that includes the distinct vehicle operational scenario, communicating shared scenario-specific operational control management data associated with the distinct vehicle operational scenario with an external shared scenario-specific operational control management system, operating a scenario-specific operational control evaluation module instance including an instance of a scenario-specific operational control evaluation model of the distinct vehicle operational scenario, and wherein operating the scenario-specific operational control evaluation module instance includes identifying a policy for the scenario-specific operational control evaluation model, receiving a candidate vehicle control action from the policy for the scenario-specific operational control evaluation model, and traversing a portion of the vehicle transportation network based on the candidate vehicle control action. |
Wray, Kyle Hollins; Witwicki, Stefan J; Zilberstein, Shlomo Centralized Shared Autonomous Vehicle Operational Management Miscellaneous 2021, (US Patent App. 16/955,531). @misc{SZ:WWZpatent21a, Centralized shared scenario-specific operational control management includes receiving, at a centralized shared scenario-specific operational control management device, shared scenario-specific operational control management input data, from an autonomous vehicle, validating the shared scenario-specific operational control management input data, identifying a current distinct vehicle operational scenario based on the shared scenario-specific operational control management input data, generating shared scenario-specific operational control management output data based on the current distinct vehicle operational scenario, and transmitting the shared scenario-specific operational control management output data. |
Wray, Kyle Hollins; Witwicki, Stefan J; Zilberstein, Shlomo Autonomous Vehicle Operation with Explicit Occlusion Reasoning Miscellaneous 2021, (US Patent App. 16/753,601). @misc{SZ:BWWZpatent21k, Autonomous vehicle operation with explicit occlusion reasoning may include traversing, by a vehicle, a vehicle trans-network. Traversing the vehicle transportation network can include receiving, from a sensor of the vehicle, sensor data for a portion of a vehicle operational environment, determining, using the sensor data, a visibility grid comprising coordinates forming an unobserved region within a defined distance from the vehicle, computing a probability of a presence of an external object within the unobserved region by comparing the visibility grid to a map (eg, a high-definition map), and traversing a portion of the vehicle transportation network using the probability. An apparatus and a vehicle are also described. |
Wray, Kyle Hollins; Witwicki, Stefan J; Zilberstein, Shlomo Learning Safety and Human-Centered Constraints in Autonomous Vehicles Miscellaneous 2021, (US Patent App. 16/724,635). @misc{SZ:BWWZpatent21g, Traversing a vehicle transportation network includes operating a scenario-specific operational control evaluation module instance. The scenario-specific operational control evaluation module instance includes an instance of a scenario-specific operational control evaluation model of a distinct vehicle operational scenario. Operating the scenario-specific operational control evaluation module instance includes identifying a multi-objective policy for the scenario-specific operational control evaluation model. The multi-objective policy may include a relationship between at least two objectives. Traversing the vehicle transportation network includes receiving a candidate vehicle control action associated with each of the at least two objectives. Traversing the vehicle transportation network includes selecting a vehicle control action based on a buffer value. Traversing the vehicle transportation network includes performing the selected vehicle control action, determining a preference indicator for each objective, and updating the multi-objective policy. |
Wray, Kyle Hollins; Witwicki, Stefan J; Zilberstein, Shlomo Objective-Based Reasoning in Autonomous Vehicle Decision-Making Miscellaneous 2021, (US Patent App. 16/695,613). @misc{SZ:BWWZpatent21e, Traversing a vehicle transportation network includes operating a scenario-specific operational control evaluation module instance. The scenario-specific operational control evaluation module instance includes an instance of a scenario-specific operational control evaluation model of a distinct vehicle operational scenario. Operating the scenario-specific operational control evaluation module instance includes identifying a multi-objective policy for the scenario-specific operational control evaluation model. The multi-objective policy may include a relationship between at least two objectives. Traversing the vehicle transportation network includes receiving a candidate vehicle control action associated with each of the at least two objectives. Traversing the vehicle transportation network includes selecting a vehicle control action based on a buffer value. Traversing the vehicle transportation network includes traversing a portion of the vehicle transportation network in accordance with the selected vehicle control action. |
Wray, Kyle Hollins; Witwicki, Stefan J; Zilberstein, Shlomo; Pedersen, Liam Autonomous Vehicle Operational Management Control Miscellaneous 2020, (US Patent 10,654,476). @misc{SZ:WWZCpatent20e, Autonomous vehicle operational management may include traversing, by an autonomous vehicle, a vehicle transportation network. Traversing the vehicle transportation network may include receiving, from a sensor of the autonomous vehicle, sensor information corresponding to an external object within a defined distance of the autonomous vehicle, identifying a distinct vehicle operational scenario in response to receiving the sensor information, instantiating a scenario-specific operational control evaluation module instance, wherein the scenario-specific operational control evaluation module instance is an instance of a scenario-specific operational control evaluation module modeling the distinct vehicle operational scenario, receiving a candidate vehicle control action from the scenario-specific operational control evaluation module instance, and traversing a portion of the vehicle transportation network based on the candidate vehicle control action. |
Wray, Kyle Hollins; Witwicki, Stefan J; Zilberstein, Shlomo; Pedersen, Liam Autonomous Vehicle Operational Management Including Operating A Partially Observable Markov Decision Process Model Instance Miscellaneous 2020, (US Patent App. 16/473,148). @misc{SZ:WWZPpatent20b, Autonomous vehicle operational management may include traversing, by an autonomous vehicle, a vehicle transportation network. Traversing the vehicle transportation network may include operating a scenario-specific operational control evaluation module instance, wherein the scenario-specific operational control evaluation module instance is an instance of a scenario-specific operational control evaluation module, wherein the scenario-specific operational control evaluation module implements a partially observable Markov decision process. Traversing the vehicle transportation network may include receiving a candidate vehicle control action from the scenario-specific operational control evaluation module instance, and traversing a portion of the vehicle transportation network based on the candidate vehicle control action. |
Wray, Kyle Hollins; Witwicki, Stefan J; Zilberstein, Shlomo; Pedersen, Liam Autonomous Vehicle Operational Management Blocking Monitoring Miscellaneous 2020, (US Patent App. 16/473,037). @misc{SZ:WWZPpatent20c, Autonomous vehicle operational management including blocking monitoring may include traversing, by an autonomous vehicle, a vehicle transportation network. Traversing the vehicle transportation network may include operating a blocking monitor instance, which may include identifying operational environment information including information corresponding to a first external object within a defined distance of the autonomous vehicle, determining a first area of the vehicle transportation network based on a current geospatial location of the autonomous vehicle in the vehicle transportation network and an identified route for the autonomous vehicle, and determining a probability of availability for the first area based on the operational environment information. Traversing the vehicle transportation network may include traversing a portion of the vehicle transportation network based on the probability of availability. |
Wray, Kyle Hollins; Witwicki, |