2024
|
Kyle Hollins Wray; Stefan Witwicki; Shlomo Zilberstein Belief State Determination for Real-Time Decision-Making Miscellaneous 2024, (US Patent 11,921,506). @misc{SZ:WWZpatent24c,
title = {Belief State Determination for Real-Time Decision-Making},
author = {Kyle Hollins Wray and Stefan Witwicki and Shlomo Zilberstein},
url = {https://patents.google.com/patent/US11921506B2/en},
year = {2024},
date = {2024-03-01},
publisher = {Google Patents},
abstract = {Real-time decision-making for a vehicle using belief state determination is described. Operational environment data is received while the vehicle is traversing a vehicle transportation network, where the data includes data associated with an external object. An operational environment monitor establishes an observation that relates the object to a distinct vehicle operation scenario. A belief state model of the monitor computes a belief state for the observation directly from the operational environment data. The monitor provides the computed belief state to a decision component implementing a policy that maps a respective belief state for the object within the distinct vehicle operation scenario to a respective candidate vehicle control action. A candidate vehicle control action is received from the policy of the decision component, and a vehicle control action is selected for traversing the vehicle transportation from any available candidate vehicle control actions.},
note = {US Patent 11,921,506},
keywords = {},
pubstate = {published},
tppubtype = {misc}
}
Real-time decision-making for a vehicle using belief state determination is described. Operational environment data is received while the vehicle is traversing a vehicle transportation network, where the data includes data associated with an external object. An operational environment monitor establishes an observation that relates the object to a distinct vehicle operation scenario. A belief state model of the monitor computes a belief state for the observation directly from the operational environment data. The monitor provides the computed belief state to a decision component implementing a policy that maps a respective belief state for the object within the distinct vehicle operation scenario to a respective candidate vehicle control action. A candidate vehicle control action is received from the policy of the decision component, and a vehicle control action is selected for traversing the vehicle transportation from any available candidate vehicle control actions. |
Kyle Hollins Wray; Stefan Witwicki; Shlomo Zilberstein Objective-Based Reasoning in Autonomous Vehicle Decision-Making Miscellaneous 2024, (US Patent 11,899,454). @misc{SZ:WWZpatent24b,
title = {Objective-Based Reasoning in Autonomous Vehicle Decision-Making},
author = {Kyle Hollins Wray and Stefan Witwicki and Shlomo Zilberstein},
url = {https://patents.google.com/patent/US11899454B2/en},
year = {2024},
date = {2024-02-01},
publisher = {Google Patents},
abstract = {An autonomous vehicle traverses a vehicle transportation network using a multi-objective policy based on a model for specific scenarios. The multi-objective policy includes a topographical map that shows a relationship between at least two objectives. The autonomous vehicle receives a candidate vehicle control action associated with each of the at least two objectives. The autonomous vehicle selects a vehicle control action based on a buffer value that is associated with the at least two objectives. The autonomous vehicle traverses a portion of the vehicle transportation network in accordance with the selected vehicle control action.},
note = {US Patent 11,899,454},
keywords = {},
pubstate = {published},
tppubtype = {misc}
}
An autonomous vehicle traverses a vehicle transportation network using a multi-objective policy based on a model for specific scenarios. The multi-objective policy includes a topographical map that shows a relationship between at least two objectives. The autonomous vehicle receives a candidate vehicle control action associated with each of the at least two objectives. The autonomous vehicle selects a vehicle control action based on a buffer value that is associated with the at least two objectives. The autonomous vehicle traverses a portion of the vehicle transportation network in accordance with the selected vehicle control action. |
Kyle Hollins Wray; Stefan Witwicki; Shlomo Zilberstein Shared Autonomous Vehicle Operational Management Miscellaneous 2024, (US Patent 11,874,120). @misc{SZ:WWZpatent24a,
title = {Shared Autonomous Vehicle Operational Management},
author = {Kyle Hollins Wray and Stefan Witwicki and Shlomo Zilberstein},
url = {https://patents.google.com/patent/US11874120B2/en},
year = {2024},
date = {2024-01-01},
publisher = {Google Patents},
abstract = {Traversing, by an autonomous vehicle, a vehicle transportation network, may include identifying a distinct vehicle operational scenario, wherein traversing the vehicle transportation network includes traversing a portion of the vehicle transportation network that includes the distinct vehicle operational scenario, communicating shared scenario-specific operational control management data associated with the distinct vehicle operational scenario with an external shared scenario-specific operational control management system, operating a scenario-specific operational control evaluation module instance including an instance of a scenario-specific operational control evaluation model of the distinct vehicle operational scenario, and wherein operating the scenario-specific operational control evaluation module instance includes identifying a policy for the scenario-specific operational control evaluation model, receiving a candidate vehicle control action from the policy for the scenario-specific operational control evaluation model, and traversing a portion of the vehicle transportation network based on the candidate vehicle control action.},
note = {US Patent 11,874,120},
keywords = {},
pubstate = {published},
tppubtype = {misc}
}
Traversing, by an autonomous vehicle, a vehicle transportation network, may include identifying a distinct vehicle operational scenario, wherein traversing the vehicle transportation network includes traversing a portion of the vehicle transportation network that includes the distinct vehicle operational scenario, communicating shared scenario-specific operational control management data associated with the distinct vehicle operational scenario with an external shared scenario-specific operational control management system, operating a scenario-specific operational control evaluation module instance including an instance of a scenario-specific operational control evaluation model of the distinct vehicle operational scenario, and wherein operating the scenario-specific operational control evaluation module instance includes identifying a policy for the scenario-specific operational control evaluation model, receiving a candidate vehicle control action from the policy for the scenario-specific operational control evaluation model, and traversing a portion of the vehicle transportation network based on the candidate vehicle control action. |
Shuwa Miura; Shlomo Zilberstein Observer-Aware Planning with Implicit and Explicit Communication Conference Proceedings of the The 23rd International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Auckland, New Zealand, 2024. @conference{SZ:MZaamas24,
title = {Observer-Aware Planning with Implicit and Explicit Communication},
author = {Shuwa Miura and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/MZaamas24.pdf},
year = {2024},
date = {2024-01-01},
booktitle = {Proceedings of the The 23rd International Conference on Autonomous Agents and Multiagent Systems (AAMAS)},
address = {Auckland, New Zealand},
abstract = {This paper presents a computational model designed for planning both implicit and explicit communication of intentions, goals, and desires. Building upon previous research focused on implicit communication of intention via actions, our model seeks to strategically influence an observer’s belief using both the agent’s actions and explicit messages. We show that our proposed model can be considered to be a special case of general multi-agent problems with explicit communication under certain assumptions. Since the mental state of the observer depends on histories, computing a policy for the proposed model amounts to optimizing a non-Markovian objective, which we show to be intractable in the worst case. To mitigate this challenge, we propose a technique based on splitting domain and communication actions during planning. We conclude with experimental evaluations of the proposed approach that illustrate its effectiveness.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
This paper presents a computational model designed for planning both implicit and explicit communication of intentions, goals, and desires. Building upon previous research focused on implicit communication of intention via actions, our model seeks to strategically influence an observer’s belief using both the agent’s actions and explicit messages. We show that our proposed model can be considered to be a special case of general multi-agent problems with explicit communication under certain assumptions. Since the mental state of the observer depends on histories, computing a policy for the proposed model amounts to optimizing a non-Markovian objective, which we show to be intractable in the worst case. To mitigate this challenge, we propose a technique based on splitting domain and communication actions during planning. We conclude with experimental evaluations of the proposed approach that illustrate its effectiveness. |
Saaduddin Mahmud; Marcell Vazquez-Chanlatte; Stefan Witwicki; Shlomo Zilberstein Explaining the Behavior of POMDP-based Agents Through the Impact of Counterfactual Information Conference Proceedings of the The 23rd International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Auckland, New Zealand, 2024. @conference{SZ:MVWZaamas24,
title = {Explaining the Behavior of POMDP-based Agents Through the Impact of Counterfactual Information},
author = {Saaduddin Mahmud and Marcell Vazquez-Chanlatte and Stefan Witwicki and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/MVWZaamas24.pdf},
year = {2024},
date = {2024-01-01},
booktitle = {Proceedings of the The 23rd International Conference on Autonomous Agents and Multiagent Systems (AAMAS)},
address = {Auckland, New Zealand},
abstract = {In this work, we consider AI agents operating in Partially Observable Markov Decision Processes (POMDPs)–a widely-used framework for sequential decision making with incomplete state information. Agents operating with partial information take actions not only to advance their underlying goals but also to seek information and reduce uncertainty. Despite rapid progress in explainable AI, research on separating information-driven vs. goal-driven behaviors remains sparse. To address this gap, we introduce a novel explanation generation framework called Sequential Information Probing (SIP), to investigate the direct impact of state information, or its absence, on agent behavior. To quantify the impact we also propose two metrics under this SIP framework called Value of Information (VoI) and Influence of Information (IoI). We then theoretically derive several properties of these metrics. Finally, we present several experiments, including a case study on an autonomous vehicle, that illustrate the efficacy of our method.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
In this work, we consider AI agents operating in Partially Observable Markov Decision Processes (POMDPs)–a widely-used framework for sequential decision making with incomplete state information. Agents operating with partial information take actions not only to advance their underlying goals but also to seek information and reduce uncertainty. Despite rapid progress in explainable AI, research on separating information-driven vs. goal-driven behaviors remains sparse. To address this gap, we introduce a novel explanation generation framework called Sequential Information Probing (SIP), to investigate the direct impact of state information, or its absence, on agent behavior. To quantify the impact we also propose two metrics under this SIP framework called Value of Information (VoI) and Influence of Information (IoI). We then theoretically derive several properties of these metrics. Finally, we present several experiments, including a case study on an autonomous vehicle, that illustrate the efficacy of our method. |
Moumita Choudhury; Sandhya Saisubramanian; Hao Zhang; Shlomo Zilberstein Minimizing Negative Side Effects in Cooperative Multi-Agent Systems Using Distributed Coordination Conference Proceedings of the The 23rd International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Auckland, New Zealand, 2024. @conference{SZ:CSZZaamas24,
title = {Minimizing Negative Side Effects in Cooperative Multi-Agent Systems Using Distributed Coordination},
author = {Moumita Choudhury and Sandhya Saisubramanian and Hao Zhang and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/CSZZaamas24.pdf},
year = {2024},
date = {2024-01-01},
booktitle = {Proceedings of the The 23rd International Conference on Autonomous Agents and Multiagent Systems (AAMAS)},
address = {Auckland, New Zealand},
abstract = {Autonomous agents in real-world environments may encounter undesirable outcomes or negative side effects (NSEs) when working collaboratively alongside other agents. We frame the challenge of minimizing NSEs in a multi-agent setting as a lexicographic decentralized Markov decision process in which we assume independence of rewards and transitions with respect to the primary assigned tasks, but allowing negative side effects to create a form of dependence among the agents. We present a lexicographic Q-learning approach to mitigate the NSEs using human feedback models while maintaining near-optimality with respect to the assigned tasks–up to some given slack. Our empirical evaluation across two domains demonstrates that our collaborative approach effectively mitigates NSEs, outperforming non-collaborative methods.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Autonomous agents in real-world environments may encounter undesirable outcomes or negative side effects (NSEs) when working collaboratively alongside other agents. We frame the challenge of minimizing NSEs in a multi-agent setting as a lexicographic decentralized Markov decision process in which we assume independence of rewards and transitions with respect to the primary assigned tasks, but allowing negative side effects to create a form of dependence among the agents. We present a lexicographic Q-learning approach to mitigate the NSEs using human feedback models while maintaining near-optimality with respect to the assigned tasks–up to some given slack. Our empirical evaluation across two domains demonstrates that our collaborative approach effectively mitigates NSEs, outperforming non-collaborative methods. |
Moumita Choudhury; Sandhya Saisubramanian; Hao Zhang; Shlomo Zilberstein Minimizing Negative Side Effects in Cooperative Multi-Agent Systems Using Distributed Coordination Conference Proceedings of the The 37th International FLAIRS Conference, Miramar Beach, Florida, 2024. @conference{SZ:CSZZflairs24,
title = {Minimizing Negative Side Effects in Cooperative Multi-Agent Systems Using Distributed Coordination},
author = {Moumita Choudhury and Sandhya Saisubramanian and Hao Zhang and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/CSZZflairs24.pdf},
year = {2024},
date = {2024-01-01},
booktitle = {Proceedings of the The 37th International FLAIRS Conference},
address = {Miramar Beach, Florida},
abstract = {Autonomous agents operating in real-world environments frequently encounter undesirable outcomes or negative side effects (NSEs) when working collaboratively alongside other agents. Even when agents can execute their primary task optimally when operating in isolation, their training may not account for potential negative interactions that arise in the presence of other agents. We frame the challenge of minimizing NSEs as a lexicographic decentralized Markov decision process in which we assume independence of rewards and transitions with respect to the primary assigned tasks, but recognize that addressing negative side effects creates a form of dependence among the agents. We present a lexicographic Q-learning approach to mitigate the NSEs using human feedback models while maintaining near-optimality with respect to the assigned tasks–up to some given slack. Our empirical evaluation across two domains demonstrates that our collaborative approach effectively mitigates NSEs, outperforming non-collaborative methods.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Autonomous agents operating in real-world environments frequently encounter undesirable outcomes or negative side effects (NSEs) when working collaboratively alongside other agents. Even when agents can execute their primary task optimally when operating in isolation, their training may not account for potential negative interactions that arise in the presence of other agents. We frame the challenge of minimizing NSEs as a lexicographic decentralized Markov decision process in which we assume independence of rewards and transitions with respect to the primary assigned tasks, but recognize that addressing negative side effects creates a form of dependence among the agents. We present a lexicographic Q-learning approach to mitigate the NSEs using human feedback models while maintaining near-optimality with respect to the assigned tasks–up to some given slack. Our empirical evaluation across two domains demonstrates that our collaborative approach effectively mitigates NSEs, outperforming non-collaborative methods. |
Qingyuan Lu; Justin Svegliato; Samer B. Nashed; Shlomo Zilberstein; Stuart Russell Ethically Compliant Autonomous Systems under Partial Observability Conference Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Yokohama, Japan, 2024. @conference{SZ:LSNZRicra24,
title = {Ethically Compliant Autonomous Systems under Partial Observability},
author = {Qingyuan Lu and Justin Svegliato and Samer B. Nashed and Shlomo Zilberstein and Stuart Russell},
url = {http://rbr.cs.umass.edu/shlomo/papers/LSNZRicra24.pdf},
year = {2024},
date = {2024-01-01},
booktitle = {Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)},
address = {Yokohama, Japan},
abstract = {Ethically compliant autonomous systems (ECAS) are the prevailing approach to building robotic systems that perform sequential decision making subject to ethical theories in fully observable environments. However, in real-world robotics settings, these systems often operate under partial observability because of sensor limitations, environmental conditions, or limited inference due to bounded computational resources. Therefore, this paper proposes a partially observable ECAS (PO-ECAS), bringing this work one step closer to being a practical and useful tool for roboticists. First, we formally introduce the PO-ECAS framework and a MILP-based solution method for approximating an optimal ethically compliant policy. Next, we extend an existing ethical framework for prima facie duties to belief space and offer an ethical framework for virtue ethics inspired by Aristotle's Doctrine of the Mean. Finally, we demonstrate that our approach is effective in a simulated campus patrol robot domain.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Ethically compliant autonomous systems (ECAS) are the prevailing approach to building robotic systems that perform sequential decision making subject to ethical theories in fully observable environments. However, in real-world robotics settings, these systems often operate under partial observability because of sensor limitations, environmental conditions, or limited inference due to bounded computational resources. Therefore, this paper proposes a partially observable ECAS (PO-ECAS), bringing this work one step closer to being a practical and useful tool for roboticists. First, we formally introduce the PO-ECAS framework and a MILP-based solution method for approximating an optimal ethically compliant policy. Next, we extend an existing ethical framework for prima facie duties to belief space and offer an ethical framework for virtue ethics inspired by Aristotle's Doctrine of the Mean. Finally, we demonstrate that our approach is effective in a simulated campus patrol robot domain. |
Samer B. Nashed; Roderic A. Grupen; Shlomo Zilberstein Choosing the Right Tool for the Job: Online Decision Making over SLAM Algorithms Conference Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Yokohama, Japan, 2024. @conference{SZ:NGZicra24,
title = {Choosing the Right Tool for the Job: Online Decision Making over SLAM Algorithms},
author = {Samer B. Nashed and Roderic A. Grupen and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/NGZicra24.pdf},
year = {2024},
date = {2024-01-01},
booktitle = {Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)},
address = {Yokohama, Japan},
abstract = {Nearly all state-of-the-art SLAM algorithms are designed to exploit patterns in data from specific sensing modalities, such as time-of-flight and structured light depth sensors, or RGB cameras. This specialization increases localization accuracy in domains where the given modality detects many high-quality features, but comes at the cost of decreasing performance in other, less favorable environments. For robotic systems that may experience a wide variety of sensing conditions, this difficulty in generalization presents a significant challenge. In this paper, we propose running several computationally cheap SLAM front ends in parallel and choosing the most promising feature set online. This problem is similar to the Algorithm Selection Problem (ASP), but has several complicating factors that preclude application of existing methods. We first provide an extension of the ASP formalism that captures the unique challenges in the SLAM setting, and then, based on this formalism, we propose modeling the SLAM ASP as a partially observable Markov decision process (POMDP). Our experiments show that dynamically selecting SLAM front ends, even myopically, improves localization robustness compared to selecting a static front end, and that using a POMDP policy provides even greater improvement.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Nearly all state-of-the-art SLAM algorithms are designed to exploit patterns in data from specific sensing modalities, such as time-of-flight and structured light depth sensors, or RGB cameras. This specialization increases localization accuracy in domains where the given modality detects many high-quality features, but comes at the cost of decreasing performance in other, less favorable environments. For robotic systems that may experience a wide variety of sensing conditions, this difficulty in generalization presents a significant challenge. In this paper, we propose running several computationally cheap SLAM front ends in parallel and choosing the most promising feature set online. This problem is similar to the Algorithm Selection Problem (ASP), but has several complicating factors that preclude application of existing methods. We first provide an extension of the ASP formalism that captures the unique challenges in the SLAM setting, and then, based on this formalism, we propose modeling the SLAM ASP as a partially observable Markov decision process (POMDP). Our experiments show that dynamically selecting SLAM front ends, even myopically, improves localization robustness compared to selecting a static front end, and that using a POMDP policy provides even greater improvement. |
2023
|
Kyle Hollins Wray; Stefan Witwicki; Shlomo Zilberstein; Omar Bentahar; Arec Jamgochian Explainability of Autonomous Vehicle Decision Making Miscellaneous 2023, (US Patent 11,714,971). @misc{SZ:WWZBJpatent23e,
title = {Explainability of Autonomous Vehicle Decision Making},
author = {Kyle Hollins Wray and Stefan Witwicki and Shlomo Zilberstein and Omar Bentahar and Arec Jamgochian},
url = {https://patents.google.com/patent/US11714971B2/en},
year = {2023},
date = {2023-08-01},
publisher = {Google Patents},
abstract = {A processor is configured to execute instructions stored in a memory to identify distinct vehicle operational scenarios; instantiate decision components, where each of the decision components is an instance of a respective decision problem, and where the each of the decision components maintains a respective state describing the respective vehicle operational scenario; receive respective candidate vehicle control actions from the decision components; select an action from the respective candidate vehicle control actions, where the action is from a selected decision component of the decision components, and where the action is used to control the AV to traverse a portion of the vehicle transportation network; and generate an explanation as to why the action was selected, where the explanation includes respective descriptors of the action, the selected decision component, and a state factor of the respective state of the selected decision component.},
note = {US Patent 11,714,971},
keywords = {},
pubstate = {published},
tppubtype = {misc}
}
A processor is configured to execute instructions stored in a memory to identify distinct vehicle operational scenarios; instantiate decision components, where each of the decision components is an instance of a respective decision problem, and where the each of the decision components maintains a respective state describing the respective vehicle operational scenario; receive respective candidate vehicle control actions from the decision components; select an action from the respective candidate vehicle control actions, where the action is from a selected decision component of the decision components, and where the action is used to control the AV to traverse a portion of the vehicle transportation network; and generate an explanation as to why the action was selected, where the explanation includes respective descriptors of the action, the selected decision component, and a state factor of the respective state of the selected decision component. |
Kyle Hollins Wray; Stefan Witwicki; Shlomo Zilberstein Autonomous Vehicle Operation with Explicit Occlusion Reasoning Miscellaneous 2023, (US Patent 11,702,070). @misc{SZ:WWZpatent23d,
title = {Autonomous Vehicle Operation with Explicit Occlusion Reasoning},
author = {Kyle Hollins Wray and Stefan Witwicki and Shlomo Zilberstein},
url = {https://patents.google.com/patent/US11702070B2/en},
year = {2023},
date = {2023-07-01},
publisher = {Google Patents},
abstract = {Autonomous vehicle operation with explicit occlusion reasoning may include traversing, by a vehicle, a vehicle transporation network. Traversing the vehicle transportation network can include receiving, from a sensor of the vehicle, sensor data for a portion of a vehicle operational environment, determining, using the sensor data, a visibility grid comprising coordinates forming an unobserved region within a defined distance from the vehicle, computing a probability of a presence of an external object within the unobserved region by comparing the visibility grid to a map (e.g., a high-definition map), and traversing a portion of the vehicle transportation network using the probability. An apparatus and a vehicle are also described.},
note = {US Patent 11,702,070},
keywords = {},
pubstate = {published},
tppubtype = {misc}
}
Autonomous vehicle operation with explicit occlusion reasoning may include traversing, by a vehicle, a vehicle transporation network. Traversing the vehicle transportation network can include receiving, from a sensor of the vehicle, sensor data for a portion of a vehicle operational environment, determining, using the sensor data, a visibility grid comprising coordinates forming an unobserved region within a defined distance from the vehicle, computing a probability of a presence of an external object within the unobserved region by comparing the visibility grid to a map (e.g., a high-definition map), and traversing a portion of the vehicle transportation network using the probability. An apparatus and a vehicle are also described. |
Kyle Hollins Wray; Stefan Witwicki; Shlomo Zilberstein Risk Aware Executor with Action Set Recommendations Miscellaneous 2023, (US Patent 11,635,758). @misc{SZ:WWZpatent23c,
title = {Risk Aware Executor with Action Set Recommendations},
author = {Kyle Hollins Wray and Stefan Witwicki and Shlomo Zilberstein},
url = {https://patents.google.com/patent/US11635758B2/en},
year = {2023},
date = {2023-04-01},
publisher = {Google Patents},
abstract = {A method for use in traversing a vehicle transportation network by an autonomous vehicle (AV) includes traversing, by the AV, the vehicle transportation network. Traversing the vehicle transportation network includes identifying a distinct vehicle operational scenario; instantiating a first decision component instance; receiving a first set of candidate vehicle control actions from the first decision component instance; selecting an action; and controlling the AV to traverse a portion of the vehicle transportation network based on the action. The first decision component instance is an instance of a first decision component modeling the distinct vehicle operational scenario.},
note = {US Patent 11,635,758},
keywords = {},
pubstate = {published},
tppubtype = {misc}
}
A method for use in traversing a vehicle transportation network by an autonomous vehicle (AV) includes traversing, by the AV, the vehicle transportation network. Traversing the vehicle transportation network includes identifying a distinct vehicle operational scenario; instantiating a first decision component instance; receiving a first set of candidate vehicle control actions from the first decision component instance; selecting an action; and controlling the AV to traverse a portion of the vehicle transportation network based on the action. The first decision component instance is an instance of a first decision component modeling the distinct vehicle operational scenario. |
Connor Basich; Justin Svegliato; Kyle Hollins Wray; Stefan Witwicki; Joydeep Biswas; Shlomo Zilberstein Competence-Aware Systems Journal Article In: Artificial Intelligence (AIJ), iss. 316, pp. 103844, 2023. @article{SZ:BSWWBZaij23,
title = {Competence-Aware Systems},
author = {Connor Basich and Justin Svegliato and Kyle Hollins Wray and Stefan Witwicki and Joydeep Biswas and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/BSWWBZaij23.pdf},
doi = {10.1016/j.artint.2022.103844},
year = {2023},
date = {2023-03-16},
urldate = {2023-03-16},
journal = {Artificial Intelligence (AIJ)},
issue = {316},
pages = {103844},
abstract = {Building autonomous systems for deployment in the open world has been a longstanding objective in both artificial intelligence and robotics. The open world, however, presents challenges that question some of the assumptions often made in contemporary AI models. Autonomous systems that operate in the open world face complex, non-stationary environments wherein enumerating all situations the system may face over the course of its deployment is intractable. Nevertheless, these systems are expected to operate safely and reliably for extended durations. Consequently, AI systems often rely on some degree of human assistance to mitigate risks while completing their tasks, and are hence better treated as semi-autonomous systems. In order to reduce unnecessary reliance on humans and optimize autonomy, we propose a novel introspective planning model—competence-aware systems (CAS)—that enables a semi-autonomous system to reason about its own competence and allowed level of autonomy by leveraging human feedback or assistance. A CAS learns to adjust its level of autonomy based on experience and interactions with a human authority so as to reduce improper reliance on the human and optimize the degree of autonomy it employs in any given circumstance. To handle situations in which the initial CAS model has insufficient state information to properly discriminate feedback received from humans, we introduce a methodology called iterative state space refinement that gradually increases the granularity of the state space online. The approach exploits information that exists in the standard CAS model and requires no additional input from the human. The result is an agent that can more confidently predict the correct feedback from the human authority in each level of autonomy, enabling it learn its competence in a larger portion of the state space.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Building autonomous systems for deployment in the open world has been a longstanding objective in both artificial intelligence and robotics. The open world, however, presents challenges that question some of the assumptions often made in contemporary AI models. Autonomous systems that operate in the open world face complex, non-stationary environments wherein enumerating all situations the system may face over the course of its deployment is intractable. Nevertheless, these systems are expected to operate safely and reliably for extended durations. Consequently, AI systems often rely on some degree of human assistance to mitigate risks while completing their tasks, and are hence better treated as semi-autonomous systems. In order to reduce unnecessary reliance on humans and optimize autonomy, we propose a novel introspective planning model—competence-aware systems (CAS)—that enables a semi-autonomous system to reason about its own competence and allowed level of autonomy by leveraging human feedback or assistance. A CAS learns to adjust its level of autonomy based on experience and interactions with a human authority so as to reduce improper reliance on the human and optimize the degree of autonomy it employs in any given circumstance. To handle situations in which the initial CAS model has insufficient state information to properly discriminate feedback received from humans, we introduce a methodology called iterative state space refinement that gradually increases the granularity of the state space online. The approach exploits information that exists in the standard CAS model and requires no additional input from the human. The result is an agent that can more confidently predict the correct feedback from the human authority in each level of autonomy, enabling it learn its competence in a larger portion of the state space. |
Kyle Hollins Wray; Stefan Witwicki; Shlomo Zilberstein Learning Safety and Human-Centered Constraints in Autonomous Vehicles Miscellaneous 2023, (US Patent 11,613,269). @misc{SZ:WWZpatent23b,
title = {Learning Safety and Human-Centered Constraints in Autonomous Vehicles},
author = {Kyle Hollins Wray and Stefan Witwicki and Shlomo Zilberstein},
url = {https://patents.google.com/patent/US11613269B2/en},
year = {2023},
date = {2023-03-01},
publisher = {Google Patents},
abstract = {Traversing a vehicle transportation network includes operating a scenario-specific operational control evaluation module instance. The scenario-specific operational control evaluation module instance includes an instance of a scenario-specific operational control evaluation model of a distinct vehicle operational scenario. Operating the scenario-specific operational control evaluation module instance includes identifying a multi-objective policy for the scenario-specific operational control evaluation model. The multi-objective policy may include a relationship between at least two objectives. Traversing the vehicle transportation network includes receiving a candidate vehicle control action associated with each of the at least two objectives. Traversing the vehicle transportation network includes selecting a vehicle control action based on a buffer value. Traversing the vehicle transportation network includes performing the selected vehicle control action, determining a preference indicator for each objective, and updating the multi-objective policy.},
note = {US Patent 11,613,269},
keywords = {},
pubstate = {published},
tppubtype = {misc}
}
Traversing a vehicle transportation network includes operating a scenario-specific operational control evaluation module instance. The scenario-specific operational control evaluation module instance includes an instance of a scenario-specific operational control evaluation model of a distinct vehicle operational scenario. Operating the scenario-specific operational control evaluation module instance includes identifying a multi-objective policy for the scenario-specific operational control evaluation model. The multi-objective policy may include a relationship between at least two objectives. Traversing the vehicle transportation network includes receiving a candidate vehicle control action associated with each of the at least two objectives. Traversing the vehicle transportation network includes selecting a vehicle control action based on a buffer value. Traversing the vehicle transportation network includes performing the selected vehicle control action, determining a preference indicator for each objective, and updating the multi-objective policy. |
Kyle Hollins Wray; Omar Bentahar; Astha Vagadia; Laura Cesafsky; Arec Jamgochian; Stefan Witwicki; Najamuddin Mirza Baig; Julius S Gyorfi; Shlomo Zilberstein; Sparsh Sharma Explainability of Autonomous Vehicle Decision Making Miscellaneous 2023, (US Patent 11,577,746). @misc{SZ:WBVCJWpatent23a,
title = {Explainability of Autonomous Vehicle Decision Making},
author = {Kyle Hollins Wray and Omar Bentahar and Astha Vagadia and Laura Cesafsky and Arec Jamgochian and Stefan Witwicki and Najamuddin Mirza Baig and Julius S Gyorfi and Shlomo Zilberstein and Sparsh Sharma},
url = {https://patents.google.com/patent/US11577746B2/en},
year = {2023},
date = {2023-02-01},
publisher = {Google Patents},
abstract = {A processor is configured to execute instructions stored in a memory to determine, in response to identifying vehicle operational scenarios of a scene, an action for controlling the AV, where the action is from a selected decision component that determined the action based on level of certainty associated with a state factor; generate an explanation as to why the action was selected, such that the explanation includes respective descriptors of the action, the selected decision component, and the state factor; and display the explanation in a graphical view that includes a first graphical indicator of a world object of the selected decision component, a second graphical indicator describing the state factor, and a third graphical indicator describing the action.},
note = {US Patent 11,577,746},
keywords = {},
pubstate = {published},
tppubtype = {misc}
}
A processor is configured to execute instructions stored in a memory to determine, in response to identifying vehicle operational scenarios of a scene, an action for controlling the AV, where the action is from a selected decision component that determined the action based on level of certainty associated with a state factor; generate an explanation as to why the action was selected, such that the explanation includes respective descriptors of the action, the selected decision component, and the state factor; and display the explanation in a graphical view that includes a first graphical indicator of a world object of the selected decision component, a second graphical indicator describing the state factor, and a third graphical indicator describing the action. |
Aishwarya Kamath; Sandhya Saisubramanian; Praveen Paruchuri; Akshat Kumar; Shlomo Zilberstein Planning and Learning for Non-Markovian Negative Side Effects Using Finite State Controllers Conference Proceedings of the 37th Conference on Artificial Intelligence (AAAI), 2023. @conference{SZ:SSPKZaaai23,
title = {Planning and Learning for Non-Markovian Negative Side Effects Using Finite State Controllers},
author = {Aishwarya Kamath and Sandhya Saisubramanian and Praveen Paruchuri and Akshat Kumar and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/SSPKZaaai23.pdf},
year = {2023},
date = {2023-01-01},
urldate = {2023-01-01},
booktitle = {Proceedings of the 37th Conference on Artificial Intelligence (AAAI)},
abstract = {Autonomous systems are often deployed in the open world where it is hard to obtain complete specifications of objectives and constraints. Operating based on an incomplete model can produce negative side effects (NSEs), which affect the safety and reliability of the system. We focus on mitigating NSEs in environments modeled as Markov decision processes (MDPs). First, we learn a model of NSEs using observed data that contains state-action trajectories and the severity of the associated NSEs. Unlike previous works that associate NSEs with state-action pairs, our framework associates NSEs with entire trajectories, which is more general and captures non-Markovian dependence on states and actions. Second, we learn finite state controllers (FSCs) that predict the NSE severity for a given trajectory and generalize well to unseen data. Finally, we develop a constrained MDP model that uses information from both the underlying MDP and the learned FSC for planning while avoiding NSEs. Our empirical evaluation demonstrates the effective- ness of our approach in learning and mitigating Markovian and non-Markovian NSEs.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Autonomous systems are often deployed in the open world where it is hard to obtain complete specifications of objectives and constraints. Operating based on an incomplete model can produce negative side effects (NSEs), which affect the safety and reliability of the system. We focus on mitigating NSEs in environments modeled as Markov decision processes (MDPs). First, we learn a model of NSEs using observed data that contains state-action trajectories and the severity of the associated NSEs. Unlike previous works that associate NSEs with state-action pairs, our framework associates NSEs with entire trajectories, which is more general and captures non-Markovian dependence on states and actions. Second, we learn finite state controllers (FSCs) that predict the NSE severity for a given trajectory and generalize well to unseen data. Finally, we develop a constrained MDP model that uses information from both the underlying MDP and the learned FSC for planning while avoiding NSEs. Our empirical evaluation demonstrates the effective- ness of our approach in learning and mitigating Markovian and non-Markovian NSEs. |
Connor Basich; Shlomo Zilberstein; Joydeep Biswas Competence-Aware Autonomy: An Essential Skill for Robots in the Real World Conference Proceedings of the 37th Conference on Artificial Intelligence (AAAI) Bridge Program, 2023. @conference{SZ:BBZaaai23bridge,
title = {Competence-Aware Autonomy: An Essential Skill for Robots in the Real World},
author = {Connor Basich and Shlomo Zilberstein and Joydeep Biswas},
url = {http://rbr.cs.umass.edu/shlomo/papers/BBZaaai23bridge.pdf},
year = {2023},
date = {2023-01-01},
booktitle = {Proceedings of the 37th Conference on Artificial Intelligence (AAAI) Bridge Program},
abstract = {Recent efforts in AI and robotics towards deploying intelligent robotic systems in the real world offer the possibility of transformational impacts on society. For such systems to be successful while reliably maintaining safe operation, they must be cognizant of their limitations, and when uncertain about their autonomous capabilities, solicit human assistance. However, system designers cannot fully enumerate the space of all situations that a robot deployed in the real world might face, prompting the challenge of endowing robots with actionable awareness of their capabilities and limitations in unseen settings. We propose competence-aware autonomy as a means of addressing this challenge in a well-defined manner motivated by real world examples. We discuss recent prior work in this area and suggest several research challenges and opportunities for future work.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Recent efforts in AI and robotics towards deploying intelligent robotic systems in the real world offer the possibility of transformational impacts on society. For such systems to be successful while reliably maintaining safe operation, they must be cognizant of their limitations, and when uncertain about their autonomous capabilities, solicit human assistance. However, system designers cannot fully enumerate the space of all situations that a robot deployed in the real world might face, prompting the challenge of endowing robots with actionable awareness of their capabilities and limitations in unseen settings. We propose competence-aware autonomy as a means of addressing this challenge in a well-defined manner motivated by real world examples. We discuss recent prior work in this area and suggest several research challenges and opportunities for future work. |
Saaduddin Mahmud; Connor Basich; Shlomo Zilberstein Semi-Autonomous Systems with Contextual Competence Awareness Conference Proceedings of the 22nd International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), 2023. @conference{SZ:MBZaamas23,
title = {Semi-Autonomous Systems with Contextual Competence Awareness},
author = {Saaduddin Mahmud and Connor Basich and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/MBZaamas23.pdf},
year = {2023},
date = {2023-01-01},
booktitle = {Proceedings of the 22nd International Conference on Autonomous Agents and MultiAgent Systems (AAMAS)},
pages = {689–697},
abstract = {Competence modeling is critical for the efficient and safe operation of semi-autonomous systems (SAS) with varying levels of autonomy. In this paper, we extend the notion of competence modeling by introducing a contextual competence model. While previous work on competence-aware systems (CAS) defined the competence of a SAS relative to a single static operator, we present an augmented operator model that is contextualized by Markovian state information capable of capturing multiple operators. Access to such information allows the SAS to account for the stochastic shifts that may occur in the behavior of the operator(s) during deployment and optimize its autonomy accordingly. We show that the extended model called Contextual Competence Aware System (CoCAS) has the same convergence guarantees as CAS, and empirically illustrate the benefit of our approach over both the original CAS model as well as other relevant work in shared autonomy.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Competence modeling is critical for the efficient and safe operation of semi-autonomous systems (SAS) with varying levels of autonomy. In this paper, we extend the notion of competence modeling by introducing a contextual competence model. While previous work on competence-aware systems (CAS) defined the competence of a SAS relative to a single static operator, we present an augmented operator model that is contextualized by Markovian state information capable of capturing multiple operators. Access to such information allows the SAS to account for the stochastic shifts that may occur in the behavior of the operator(s) during deployment and optimize its autonomy accordingly. We show that the extended model called Contextual Competence Aware System (CoCAS) has the same convergence guarantees as CAS, and empirically illustrate the benefit of our approach over both the original CAS model as well as other relevant work in shared autonomy. |
Samer B. Nashed; Saaduddin Mahmud; Claudia V. Goldman; Shlomo Zilberstein Causal Explanations for Sequential Decision Making Under Uncertainty (Extended Abstract) Conference Proceedings of the 22nd International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), 2023. @conference{SZ:NMGZaamas23,
title = {Causal Explanations for Sequential Decision Making Under Uncertainty (Extended Abstract)},
author = {Samer B. Nashed and Saaduddin Mahmud and Claudia V. Goldman and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/NMGZaamas23.pdf},
year = {2023},
date = {2023-01-01},
booktitle = {Proceedings of the 22nd International Conference on Autonomous Agents and MultiAgent Systems (AAMAS)},
pages = {2307–2309},
abstract = {Competence modeling is critical for the efficient and safe operation of semi-autonomous systems (SAS) with varying levels of autonomy. In this paper, we extend the notion of competence modeling by introducing a contextual competence model. While previous work on competence-aware systems (CAS) defined the competence of a SAS relative to a single static operator, we present an augmented operator model that is contextualized by Markovian state information capable of capturing multiple operators. Access to such information allows the SAS to account for the stochastic shifts that may occur in the behavior of the operator(s) during deployment and optimize its autonomy accordingly. We show that the extended model called Contextual Competence Aware System (CoCAS) has the same convergence guarantees as CAS, and empirically illustrate the benefit of our approach over both the original CAS model as well as other relevant work in shared autonomy.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Competence modeling is critical for the efficient and safe operation of semi-autonomous systems (SAS) with varying levels of autonomy. In this paper, we extend the notion of competence modeling by introducing a contextual competence model. While previous work on competence-aware systems (CAS) defined the competence of a SAS relative to a single static operator, we present an augmented operator model that is contextualized by Markovian state information capable of capturing multiple operators. Access to such information allows the SAS to account for the stochastic shifts that may occur in the behavior of the operator(s) during deployment and optimize its autonomy accordingly. We show that the extended model called Contextual Competence Aware System (CoCAS) has the same convergence guarantees as CAS, and empirically illustrate the benefit of our approach over both the original CAS model as well as other relevant work in shared autonomy. |
Saaduddin Mahmud; Sandhya Saisubramanian; Shlomo Zilberstein Explanation-Guided Reward Alignment Conference Proceedings of the 32nd International Joint Conference on Artificial Intelligence (IJCAI), 2023. @conference{SZ:MSZijcai23,
title = {Explanation-Guided Reward Alignment},
author = {Saaduddin Mahmud and Sandhya Saisubramanian and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/MSZijcai23.pdf},
year = {2023},
date = {2023-01-01},
urldate = {2020-01-01},
booktitle = {Proceedings of the 32nd International Joint Conference on Artificial Intelligence (IJCAI)},
abstract = {Agents often need to infer a reward function from observations in order to learn desired behaviors. However, agents may infer a reward function that does not align with the original intent, as there can be multiple reward functions consistent with their observations. Operating based on such misaligned rewards can be risky. Furthermore, black-box representations make it difficult to verify the learned reward functions and prevent harmful behavior. We present a framework for verifying and improving reward alignment using explanations, and we show how explanations can help detect misalignment and reveal failure cases in novel scenarios. The problem is formulated as inverse reinforcement learning from ranked trajectories. Verification tests created from the trajectory dataset are used to iteratively verify and improve reward alignment. The agent explains its learned reward, and a tester signals whether the explanation passes the test. In cases where the explanation fails, the agent offers alternative explanations to gather feedback, which is then used to improve the learned reward. We analyze the efficiency of our approach in improving reward alignment using different types of explanations and demonstrate its effectiveness in five domains.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Agents often need to infer a reward function from observations in order to learn desired behaviors. However, agents may infer a reward function that does not align with the original intent, as there can be multiple reward functions consistent with their observations. Operating based on such misaligned rewards can be risky. Furthermore, black-box representations make it difficult to verify the learned reward functions and prevent harmful behavior. We present a framework for verifying and improving reward alignment using explanations, and we show how explanations can help detect misalignment and reveal failure cases in novel scenarios. The problem is formulated as inverse reinforcement learning from ranked trajectories. Verification tests created from the trajectory dataset are used to iteratively verify and improve reward alignment. The agent explains its learned reward, and a tester signals whether the explanation passes the test. In cases where the explanation fails, the agent offers alternative explanations to gather feedback, which is then used to improve the learned reward. We analyze the efficiency of our approach in improving reward alignment using different types of explanations and demonstrate its effectiveness in five domains. |
Abhinav Bhatia; Samer B. Nashed; Shlomo Zilberstein RL3: Boosting Meta Reinforcement Learning via RL inside RL2 Conference NeurIPS Workshop on Generalized Planning (GenPlan), New Orleans, Louisiana, 2023. @conference{SZ:BNZgenplan23,
title = {RL3: Boosting Meta Reinforcement Learning via RL inside RL2},
author = {Abhinav Bhatia and Samer B. Nashed and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/BNZgenplan23.pdf},
year = {2023},
date = {2023-01-01},
urldate = {2023-01-01},
booktitle = {NeurIPS Workshop on Generalized Planning (GenPlan)},
address = {New Orleans, Louisiana},
abstract = {Meta reinforcement learning (meta-RL) methods such as RL2 have emerged as promising approaches for learning data-efficient RL algorithms tailored to a given task distribution. However, these RL algorithms struggle with long-horizon tasks and out-of-distribution tasks since they rely on recurrent neural networks to pro- cess the sequence of experiences instead of summarizing them into general RL components such as value functions. Moreover, even transformers have a practical limit to the length of histories they can efficiently reason about before training and inference costs become prohibitive. In contrast, traditional RL algorithms are data-inefficient since they do not leverage domain knowledge, but they do converge to an optimal policy as more data becomes available. In this paper, we propose RL3, a principled hybrid approach that combines traditional RL and meta-RL by incorporating task-specific action-values learned through traditional RL as an input to the meta-RL neural network. We show that RL3 earns greater cumulative reward on long-horizon and out-of-distribution tasks compared to RL2, while maintaining the efficiency of the latter in the short term. Experiments are conducted on both custom and benchmark discrete domains from the meta-RL literature that exhibit a range of short-term, long-term, and complex dependencies.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Meta reinforcement learning (meta-RL) methods such as RL2 have emerged as promising approaches for learning data-efficient RL algorithms tailored to a given task distribution. However, these RL algorithms struggle with long-horizon tasks and out-of-distribution tasks since they rely on recurrent neural networks to pro- cess the sequence of experiences instead of summarizing them into general RL components such as value functions. Moreover, even transformers have a practical limit to the length of histories they can efficiently reason about before training and inference costs become prohibitive. In contrast, traditional RL algorithms are data-inefficient since they do not leverage domain knowledge, but they do converge to an optimal policy as more data becomes available. In this paper, we propose RL3, a principled hybrid approach that combines traditional RL and meta-RL by incorporating task-specific action-values learned through traditional RL as an input to the meta-RL neural network. We show that RL3 earns greater cumulative reward on long-horizon and out-of-distribution tasks compared to RL2, while maintaining the efficiency of the latter in the short term. Experiments are conducted on both custom and benchmark discrete domains from the meta-RL literature that exhibit a range of short-term, long-term, and complex dependencies. |
Saaduddin Mahmud; Samer B. Nashed; Claudia V. Goldman; Shlomo Zilberstein Estimating Causal Responsibility for Explaining Autonomous Behavior Book Section In: Calvaresi, Davide (Ed.): International Workshop on Explainable and Transparent AI and Multi-Agent Systems (EXTRAAMAS), pp. 78–94, Springer, 2023. @incollection{SZ:MNGZextraamas23,
title = {Estimating Causal Responsibility for Explaining Autonomous Behavior},
author = {Saaduddin Mahmud and Samer B. Nashed and Claudia V. Goldman and Shlomo Zilberstein},
editor = {Davide Calvaresi},
url = {http://rbr.cs.umass.edu/shlomo/papers/MNGZextraamas23.pdf},
doi = {10.1007/978-3-031-40878-6},
year = {2023},
date = {2023-01-01},
booktitle = {International Workshop on Explainable and Transparent AI and Multi-Agent Systems (EXTRAAMAS)},
pages = {78–94},
publisher = {Springer},
abstract = {There has been growing interest in causal explanations of stochastic, sequential decision-making systems. Structural causal models and causal reasoning offer several theoretical benefits when exact inference can be applied. Furthermore, users overwhelmingly prefer the resulting causal explanations over other state-of-the-art systems. In this work, we focus on one such method, MeanRESP, and its approximate versions that drastically reduce compute load and assign a responsibility score to each variable, which helps identify smaller sets of causes to be used as explanations. However, this method, and its approximate versions in particular, lack deeper theoretical analysis and broader empirical tests. To address these shortcomings, we provide three primary contributions. First, we offer several theoretical insights on the sample complexity and error rate of approximate MeanRESP. Second, we discuss several automated metrics for comparing explanations generated from approximate methods to those generated via exact methods. While we recognize the significance of user studies as the gold standard for evaluating explanations, our aim is to leverage the proposed metrics to systematically compare explanation-generation methods along important quantitative dimensions. Finally, we provide a more detailed discussion of MeanRESP and how its output under different definitions of responsibility compares to existing widely adopted methods that use Shapley values.},
keywords = {},
pubstate = {published},
tppubtype = {incollection}
}
There has been growing interest in causal explanations of stochastic, sequential decision-making systems. Structural causal models and causal reasoning offer several theoretical benefits when exact inference can be applied. Furthermore, users overwhelmingly prefer the resulting causal explanations over other state-of-the-art systems. In this work, we focus on one such method, MeanRESP, and its approximate versions that drastically reduce compute load and assign a responsibility score to each variable, which helps identify smaller sets of causes to be used as explanations. However, this method, and its approximate versions in particular, lack deeper theoretical analysis and broader empirical tests. To address these shortcomings, we provide three primary contributions. First, we offer several theoretical insights on the sample complexity and error rate of approximate MeanRESP. Second, we discuss several automated metrics for comparing explanations generated from approximate methods to those generated via exact methods. While we recognize the significance of user studies as the gold standard for evaluating explanations, our aim is to leverage the proposed metrics to systematically compare explanation-generation methods along important quantitative dimensions. Finally, we provide a more detailed discussion of MeanRESP and how its output under different definitions of responsibility compares to existing widely adopted methods that use Shapley values. |
Connor Basich; Sadduddin Mahmud; Shlomo Zilberstein Learning Constraints on Autonomous Behavior from Proactive Feedback Conference Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, Michigan, 2023. @conference{SZ:BMZiros23,
title = {Learning Constraints on Autonomous Behavior from Proactive Feedback},
author = {Connor Basich and Sadduddin Mahmud and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/BMZiros23.pdf},
year = {2023},
date = {2023-01-01},
booktitle = {Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
pages = {3680–3687},
address = {Detroit, Michigan},
abstract = {Learning from feedback is a common paradigm to acquire information that is hard to specify a priori. In this work, we consider a planning agent with a known nominal reward model that captures their high-level task objective, but is subject to constraints that are unknown a priori and must be inferred from human interventions. Unlike existing methods, our approach does not rely on full or partial demonstration trajectories or assume a fully reactive human. Instead, we assume access only to sparse interventions, which may in fact be generated proactively by the human, and make only minimal assumptions about the human. We provide both theoretical bounds on performance, and empirical validations of our method. We show that our method enables an agent to learn a constraint set with high accuracy that generalizes well to new environments within a domain, whereas methods that only consider reactive feedback learn an incorrect constraint set that does not generalize well, making constraint violations more likely in new environments.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Learning from feedback is a common paradigm to acquire information that is hard to specify a priori. In this work, we consider a planning agent with a known nominal reward model that captures their high-level task objective, but is subject to constraints that are unknown a priori and must be inferred from human interventions. Unlike existing methods, our approach does not rely on full or partial demonstration trajectories or assume a fully reactive human. Instead, we assume access only to sparse interventions, which may in fact be generated proactively by the human, and make only minimal assumptions about the human. We provide both theoretical bounds on performance, and empirical validations of our method. We show that our method enables an agent to learn a constraint set with high accuracy that generalizes well to new environments within a domain, whereas methods that only consider reactive feedback learn an incorrect constraint set that does not generalize well, making constraint violations more likely in new environments. |
Mason Nakamura; Justin Svegliato; Samer B. Nashed; Shlomo Zilberstein; Stuart Russell Formal Composition of Robotic Systems as Contract Programs Conference Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, Michigan, 2023. @conference{SZ:NSNZRiros23,
title = {Formal Composition of Robotic Systems as Contract Programs},
author = {Mason Nakamura and Justin Svegliato and Samer B. Nashed and Shlomo Zilberstein and Stuart Russell},
url = {http://rbr.cs.umass.edu/shlomo/papers/NSNZRiros23.pdf},
year = {2023},
date = {2023-01-01},
booktitle = {Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
pages = {6727–6732},
address = {Detroit, Michigan},
abstract = {Robotic systems are often composed of modular algorithms that each perform a specific function within a larger architecture, ranging from state estimation and task planning to trajectory optimization and object recognition. Existing work for specifying these systems as a formal composition of contract algorithms has limited expressiveness compared to the variety of sophisticated architectures that are commonly used in practice. Therefore, in this paper, we (1) propose a novel metareasoning framework for formally composing robotic systems as a contract program with programming constructs for functional, conditional, and looping semantics and (2) introduce a recursive hill climbing algorithm that finds a locally optimal time allocation of a contract program. In our experiments, we demonstrate that our approach outperforms baseline techniques in a simulated pick-and-place robot domain.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Robotic systems are often composed of modular algorithms that each perform a specific function within a larger architecture, ranging from state estimation and task planning to trajectory optimization and object recognition. Existing work for specifying these systems as a formal composition of contract algorithms has limited expressiveness compared to the variety of sophisticated architectures that are commonly used in practice. Therefore, in this paper, we (1) propose a novel metareasoning framework for formally composing robotic systems as a contract program with programming constructs for functional, conditional, and looping semantics and (2) introduce a recursive hill climbing algorithm that finds a locally optimal time allocation of a contract program. In our experiments, we demonstrate that our approach outperforms baseline techniques in a simulated pick-and-place robot domain. |
2022
|
Kyle Wray; Stefan Witwicki; Shlomo Zilberstein; Liam Pedersen Autonomous Vehicle Operational Management Including Operating a Partially Observable Markov Decision Process Model Instance Miscellaneous 2022, (US Patent 11,500,380). @misc{SZ:WWZPpatent22d,
title = {Autonomous Vehicle Operational Management Including Operating a Partially Observable Markov Decision Process Model Instance},
author = {Kyle Wray and Stefan Witwicki and Shlomo Zilberstein and Liam Pedersen},
url = {https://patents.google.com/patent/US11500380B2/en},
year = {2022},
date = {2022-11-01},
publisher = {Google Patents},
abstract = {Autonomous vehicle operational management may include traversing, by an autonomous vehicle, a vehicle transportation network. Traversing the vehicle transportation network may include operating a scenario-specific operational control evaluation module instance, wherein the scenario-specific operational control evaluation module instance is an instance of a scenario-specific operational control evaluation module, wherein the scenario-specific operational control evaluation module implements a partially observable Markov decision process. Traversing the vehicle transportation network may include receiving a candidate vehicle control action from the scenario-specific operational control evaluation module instance, and traversing a portion of the vehicle transportation network based on the candidate vehicle control action.},
note = {US Patent 11,500,380},
keywords = {},
pubstate = {published},
tppubtype = {misc}
}
Autonomous vehicle operational management may include traversing, by an autonomous vehicle, a vehicle transportation network. Traversing the vehicle transportation network may include operating a scenario-specific operational control evaluation module instance, wherein the scenario-specific operational control evaluation module instance is an instance of a scenario-specific operational control evaluation module, wherein the scenario-specific operational control evaluation module implements a partially observable Markov decision process. Traversing the vehicle transportation network may include receiving a candidate vehicle control action from the scenario-specific operational control evaluation module instance, and traversing a portion of the vehicle transportation network based on the candidate vehicle control action. |
Connor Basich; Kyle Hollins Wray; Stefan Witwicki; Shlomo Zilberstein Introspective Competence Modeling for AV Decision Making Miscellaneous 2022, (US Patent 11,307,585). @misc{SZ:BWWZpatent22c,
title = {Introspective Competence Modeling for AV Decision Making},
author = {Connor Basich and Kyle Hollins Wray and Stefan Witwicki and Shlomo Zilberstein},
url = {https://patents.google.com/patent/US11307585B2/en},
year = {2022},
date = {2022-04-01},
publisher = {Google Patents},
abstract = {A first method includes detecting, based on sensor data, an environment state; selecting an action based on the environment state; determining an autonomy level associated with the environment state and the action; and performing the action according to the autonomy level. The autonomy level can be selected based at least on an autonomy model and a feedback model. A second method includes calculating, by solving an extended Stochastic Shortest Path (SSP) problem, a policy for solving a task. The policy can map environment states and autonomy levels to actions and autonomy levels. Calculating the policy can include generating plans that operate across multiple levels of autonomy.},
note = {US Patent 11,307,585},
keywords = {},
pubstate = {published},
tppubtype = {misc}
}
A first method includes detecting, based on sensor data, an environment state; selecting an action based on the environment state; determining an autonomy level associated with the environment state and the action; and performing the action according to the autonomy level. The autonomy level can be selected based at least on an autonomy model and a feedback model. A second method includes calculating, by solving an extended Stochastic Shortest Path (SSP) problem, a policy for solving a task. The policy can map environment states and autonomy levels to actions and autonomy levels. Calculating the policy can include generating plans that operate across multiple levels of autonomy. |
Kyle Hollins Wray; Stefan Witwicki; Shlomo Zilberstein Multiple Objective Explanation and Control Interface Design Miscellaneous 2022, (US Patent 11,300,957). @misc{SZ:WWZpatent22b,
title = {Multiple Objective Explanation and Control Interface Design},
author = {Kyle Hollins Wray and Stefan Witwicki and Shlomo Zilberstein},
url = {https://patents.google.com/patent/US11300957B2/en},
year = {2022},
date = {2022-04-01},
publisher = {Google Patents},
abstract = {A vehicle traversing a vehicle transportation network may use a scenario-specific operational control evaluation model instance. A multi-objective policy for the model is received, wherein the policy includes at least a first objective, a second objective, and a priority of the first objective relative to the second objective. A representation of the policy (e.g., the first objective, the second objective, and the priority) is generated using a user interface. Based on feedback to the user interface, a change to the multi-objective policy for the scenario-specific operational control evaluation model is received. The change is to the first objective, the second objective, the priority, of some combination thereof. Then, for determining a vehicle control action for traversing the vehicle transportation network, an updated multi-objective policy for the scenario-specific operational control evaluation model is generated to include the change to the policy.},
note = {US Patent 11,300,957},
keywords = {},
pubstate = {published},
tppubtype = {misc}
}
A vehicle traversing a vehicle transportation network may use a scenario-specific operational control evaluation model instance. A multi-objective policy for the model is received, wherein the policy includes at least a first objective, a second objective, and a priority of the first objective relative to the second objective. A representation of the policy (e.g., the first objective, the second objective, and the priority) is generated using a user interface. Based on feedback to the user interface, a change to the multi-objective policy for the scenario-specific operational control evaluation model is received. The change is to the first objective, the second objective, the priority, of some combination thereof. Then, for determining a vehicle control action for traversing the vehicle transportation network, an updated multi-objective policy for the scenario-specific operational control evaluation model is generated to include the change to the policy. |
Samer Nashed; Shlomo Zilberstein A Survey of Opponent Modeling in Adversarial Domains Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 73, pp. 277–327, 2022. @article{SZ:NZjair22,
title = {A Survey of Opponent Modeling in Adversarial Domains},
author = {Samer Nashed and Shlomo Zilberstein},
url = {https://jair.org/index.php/jair/article/view/12889/26762},
doi = {10.1613/jair.1.12889},
year = {2022},
date = {2022-01-01},
journal = {Journal of Artificial Intelligence Research (JAIR)},
volume = {73},
pages = {277--327},
abstract = {Opponent modeling is the ability to use prior knowledge and observations in order to predict the behavior of an opponent. This survey presents a comprehensive overview of existing opponent modeling techniques for adversarial domains, many of which must address stochastic, continuous, or concurrent actions, and sparse, partially observable payoff structures. We discuss all the components of opponent modeling systems, including feature extraction, learning algorithms, and strategy abstractions. These discussions lead us to propose a new form of analysis for describing and predicting the evolution of game states over time. We then introduce a new framework that facilitates method comparison, analyze a representative selection of techniques using the proposed framework, and highlight common trends among recently proposed methods. Finally, we list several open problems and discuss future research directions inspired by AI research on opponent modeling and related research in other disciplines.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Opponent modeling is the ability to use prior knowledge and observations in order to predict the behavior of an opponent. This survey presents a comprehensive overview of existing opponent modeling techniques for adversarial domains, many of which must address stochastic, continuous, or concurrent actions, and sparse, partially observable payoff structures. We discuss all the components of opponent modeling systems, including feature extraction, learning algorithms, and strategy abstractions. These discussions lead us to propose a new form of analysis for describing and predicting the evolution of game states over time. We then introduce a new framework that facilitates method comparison, analyze a representative selection of techniques using the proposed framework, and highlight common trends among recently proposed methods. Finally, we list several open problems and discuss future research directions inspired by AI research on opponent modeling and related research in other disciplines. |
Sandhya Saisubramanian; Shlomo Zilberstein; Ece Kamar Avoiding Negative Side Effects due to Incomplete Knowledge of AI Systems Journal Article In: AI Magazine, vol. 42, no. 4, pp. 62–71, 2022. @article{SZ:SZKaimag22,
title = {Avoiding Negative Side Effects due to Incomplete Knowledge of AI Systems},
author = {Sandhya Saisubramanian and Shlomo Zilberstein and Ece Kamar},
url = {http://rbr.cs.umass.edu/shlomo/papers/SZKaimag22.pdf},
doi = {10.1609/aaai.12028},
year = {2022},
date = {2022-01-01},
urldate = {2022-01-01},
journal = {AI Magazine},
volume = {42},
number = {4},
pages = {62--71},
abstract = {Autonomous agents acting in the real-world often operate based on models that ignore certain aspects of the environment. The incompleteness of any given model – handcrafted or machine acquired – is inevitable due to practical limitations of any modeling technique for complex real-world settings. Due to the limited fidelity of its model, an agent’s actions may have unexpected, undesirable consequences during execution. Learning to recognize and avoid such negative side effects (NSEs) of an agent’s actions is critical to improve the safety and reliability of autonomous systems. Mitigating NSEs is an emerging research topic that is attracting increased attention due to the rapid growth in the deployment of AI systems and their broad societal impacts. This article provides a comprehensive overview of different forms of NSEs and the recent research efforts to address them. We identify key characteristics of NSEs, highlight the challenges in avoiding NSEs, and discuss recently developed approaches, contrasting their benefits and limitations. The article concludes with a discussion of open questions and suggestions for future research directions.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Autonomous agents acting in the real-world often operate based on models that ignore certain aspects of the environment. The incompleteness of any given model – handcrafted or machine acquired – is inevitable due to practical limitations of any modeling technique for complex real-world settings. Due to the limited fidelity of its model, an agent’s actions may have unexpected, undesirable consequences during execution. Learning to recognize and avoid such negative side effects (NSEs) of an agent’s actions is critical to improve the safety and reliability of autonomous systems. Mitigating NSEs is an emerging research topic that is attracting increased attention due to the rapid growth in the deployment of AI systems and their broad societal impacts. This article provides a comprehensive overview of different forms of NSEs and the recent research efforts to address them. We identify key characteristics of NSEs, highlight the challenges in avoiding NSEs, and discuss recently developed approaches, contrasting their benefits and limitations. The article concludes with a discussion of open questions and suggestions for future research directions. |
Sandhya Saisubramanian; Shlomo Zilberstein; Ece Kamar Avoiding Negative Side Effects of Autonomous Systems in the Open World Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 74, pp. 143–177, 2022. @article{SZ:SZKjair22,
title = {Avoiding Negative Side Effects of Autonomous Systems in the Open World},
author = {Sandhya Saisubramanian and Shlomo Zilberstein and Ece Kamar},
url = {https://www.jair.org/index.php/jair/article/view/13581/26799},
doi = {10.1613/jair.1.13581},
year = {2022},
date = {2022-01-01},
urldate = {2022-01-01},
journal = {Journal of Artificial Intelligence Research (JAIR)},
volume = {74},
pages = {143--177},
abstract = {Autonomous systems that operate in the open world often use incomplete models of their environment. Model incompleteness is inevitable due to the practical limitations in precise model specification and data collection about open-world environments. Due to the limited fidelity of the model, agent actions may produce negative side effects (NSEs) when deployed. Negative side effects are undesirable, unmodeled effects of agent actions on the environment. NSEs are inherently challenging to identify at design time and may affect the reliability, usability and safety of the system. We present two complementary approaches to mitigate the NSE via: (1) learning from feedback, and (2) environment shaping. The solution approaches target settings with different assumptions and agent responsibilities. In learning from feedback, the agent learns a penalty function associated with a NSE. We investigate the efficiency of different feedback mechanisms, including human feedback and autonomous exploration. The problem is formulated as a multi-objective Markov decision process such that optimizing the agent’s assigned task is prioritized over mitigating NSE. A slack parameter denotes the maximum allowed deviation from the optimal expected reward for the agent’s task in order to mitigate NSE. In environment shaping, we examine how a human can assist an agent, beyond providing feedback, and utilize their broader scope of knowledge to mitigate the impacts of NSE. We formulate the problem as a human-agent collaboration with decoupled objectives. The agent optimizes its assigned task and may produce NSE during its operation. The human assists the agent by performing modest reconfigurations of the environment so as to mitigate the impacts of NSE, without affecting the agent’s ability to complete its assigned task. We present an algorithm for shaping and analyze its properties. Empirical evaluations demonstrate the trade-offs in the performance of different approaches in mitigating NSE in different settings.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Autonomous systems that operate in the open world often use incomplete models of their environment. Model incompleteness is inevitable due to the practical limitations in precise model specification and data collection about open-world environments. Due to the limited fidelity of the model, agent actions may produce negative side effects (NSEs) when deployed. Negative side effects are undesirable, unmodeled effects of agent actions on the environment. NSEs are inherently challenging to identify at design time and may affect the reliability, usability and safety of the system. We present two complementary approaches to mitigate the NSE via: (1) learning from feedback, and (2) environment shaping. The solution approaches target settings with different assumptions and agent responsibilities. In learning from feedback, the agent learns a penalty function associated with a NSE. We investigate the efficiency of different feedback mechanisms, including human feedback and autonomous exploration. The problem is formulated as a multi-objective Markov decision process such that optimizing the agent’s assigned task is prioritized over mitigating NSE. A slack parameter denotes the maximum allowed deviation from the optimal expected reward for the agent’s task in order to mitigate NSE. In environment shaping, we examine how a human can assist an agent, beyond providing feedback, and utilize their broader scope of knowledge to mitigate the impacts of NSE. We formulate the problem as a human-agent collaboration with decoupled objectives. The agent optimizes its assigned task and may produce NSE during its operation. The human assists the agent by performing modest reconfigurations of the environment so as to mitigate the impacts of NSE, without affecting the agent’s ability to complete its assigned task. We present an algorithm for shaping and analyze its properties. Empirical evaluations demonstrate the trade-offs in the performance of different approaches in mitigating NSE in different settings. |
Sadegh Rabiee; Connor Basich; Kyle Hollins Wray; Shlomo Zilberstein; Joydeep Biswas Competence-Aware Path Planning Via Introspective Perception Journal Article In: IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 3218–3225, 2022. @article{SZ:RBWZBlra22,
title = {Competence-Aware Path Planning Via Introspective Perception},
author = {Sadegh Rabiee and Connor Basich and Kyle Hollins Wray and Shlomo Zilberstein and Joydeep Biswas},
url = {http://rbr.cs.umass.edu/shlomo/papers/RBWZBlra22.pdf},
doi = {10.1109/LRA.2022.3145517},
year = {2022},
date = {2022-01-01},
journal = {IEEE Robotics and Automation Letters},
volume = {7},
number = {2},
pages = {3218--3225},
abstract = {Robots deployed in the real world over extended periods of time need to reason about unexpected failures, learn to predict them, and to proactively take actions to avoid future failures. Existing approaches for competence-aware planning are either model-based, requiring explicit enumeration of known failure sources, or purely statistical, using state- and location-specific failure statistics to infer competence. We instead propose a structured model-free approach to competence-aware planning by reasoning about plan execution failures due to errors in perception, without requiring a priori enumeration of failure sources or requiring location-specific failure statistics. We introduce competence-aware path planning via introspective perception (CPIP) , a Bayesian framework to iteratively learn and exploit task-level competence in novel deployment environments. CPIP factorizes the competence-aware planning problem into two components. First, perception errors are learned in a model-free and location-agnostic setting via introspective perception prior to deployment in novel environments. Second, during actual deployments, the prediction of task-level failures is learned in a context-aware setting. Experiments in a simulation show that the proposed CPIP approach outperforms the frequentist baseline in multiple mobile robot tasks, and is further validated via real robot experiments in environments with perceptually challenging obstacles and terrain.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Robots deployed in the real world over extended periods of time need to reason about unexpected failures, learn to predict them, and to proactively take actions to avoid future failures. Existing approaches for competence-aware planning are either model-based, requiring explicit enumeration of known failure sources, or purely statistical, using state- and location-specific failure statistics to infer competence. We instead propose a structured model-free approach to competence-aware planning by reasoning about plan execution failures due to errors in perception, without requiring a priori enumeration of failure sources or requiring location-specific failure statistics. We introduce competence-aware path planning via introspective perception (CPIP) , a Bayesian framework to iteratively learn and exploit task-level competence in novel deployment environments. CPIP factorizes the competence-aware planning problem into two components. First, perception errors are learned in a model-free and location-agnostic setting via introspective perception prior to deployment in novel environments. Second, during actual deployments, the prediction of task-level failures is learned in a context-aware setting. Experiments in a simulation show that the proposed CPIP approach outperforms the frequentist baseline in multiple mobile robot tasks, and is further validated via real robot experiments in environments with perceptually challenging obstacles and terrain. |
Justin Svegliato; Connor Basich; Sandhya Saisubramanian; Shlomo Zilberstein Metareasoning for Safe Decision Making in Autonomous Systems Conference Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Philadelphia, Pennsylvania, 2022. @conference{SZ:SBSZicra22,
title = {Metareasoning for Safe Decision Making in Autonomous Systems},
author = {Justin Svegliato and Connor Basich and Sandhya Saisubramanian and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/SBSZicra22.pdf},
year = {2022},
date = {2022-01-01},
booktitle = {Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)},
address = {Philadelphia, Pennsylvania},
abstract = {Although experts carefully specify the high-level decision-making models in autonomous systems, it is infeasible to guarantee safety across every scenario during operation. We therefore propose a safety metareasoning system that optimizes the severity of the system's safety concerns and the interference to the system's task: the system executes in parallel a task process that completes a specified task and safety processes that each address a specified safety concern with a conflict resolver for arbitration. This paper offers a formal definition of a safety metareasoning system, a recommendation algorithm for a safety process, an arbitration algorithm for a conflict resolver, an application of our approach to planetary rover exploration, and a demonstration that our approach is effective in simulation.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Although experts carefully specify the high-level decision-making models in autonomous systems, it is infeasible to guarantee safety across every scenario during operation. We therefore propose a safety metareasoning system that optimizes the severity of the system's safety concerns and the interference to the system's task: the system executes in parallel a task process that completes a specified task and safety processes that each address a specified safety concern with a conflict resolver for arbitration. This paper offers a formal definition of a safety metareasoning system, a recommendation algorithm for a safety process, an arbitration algorithm for a conflict resolver, an application of our approach to planetary rover exploration, and a demonstration that our approach is effective in simulation. |
Abhinav Bhatia; Justin Svegliato; Samer B. Nashed; Shlomo Zilberstein Tuning the Hyperparameters of Anytime Planning: A Metareasoning Approach with Deep Reinforcement Learning Conference Proceedings of the 32nd International Conference on Automated Planning and Scheduling (ICAPS), Virtual Conference, 2022. @conference{SZ:BSNZicaps22,
title = {Tuning the Hyperparameters of Anytime Planning: A Metareasoning Approach with Deep Reinforcement Learning},
author = {Abhinav Bhatia and Justin Svegliato and Samer B. Nashed and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/BSNZicaps22.pdf},
year = {2022},
date = {2022-01-01},
booktitle = {Proceedings of the 32nd International Conference on Automated Planning and Scheduling (ICAPS)},
address = {Virtual Conference},
abstract = {Anytime planning algorithms often have hyperparameters that can be tuned at runtime to optimize their performance. While work on metareasoning has focused on when to interrupt an anytime planner and act on the current plan, the scope of metareasoning can be expanded to tuning the hyperparameters of the anytime planner at runtime. This paper introduces a general, decision-theoretic metareasoning approach that optimizes both the stopping point and hyperparameters of any- time planning. We begin by proposing a generalization of the standard meta-level control problem for anytime algorithms. We then offer a meta-level control technique that monitors and controls an anytime algorithm using deep reinforcement learning. Finally, we show that our approach boosts performance on a common benchmark domain that uses anytime weighted A* to solve a range of heuristic search problems and a mobile robot application that uses RRT* to solve motion planning problems.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Anytime planning algorithms often have hyperparameters that can be tuned at runtime to optimize their performance. While work on metareasoning has focused on when to interrupt an anytime planner and act on the current plan, the scope of metareasoning can be expanded to tuning the hyperparameters of the anytime planner at runtime. This paper introduces a general, decision-theoretic metareasoning approach that optimizes both the stopping point and hyperparameters of any- time planning. We begin by proposing a generalization of the standard meta-level control problem for anytime algorithms. We then offer a meta-level control technique that monitors and controls an anytime algorithm using deep reinforcement learning. Finally, we show that our approach boosts performance on a common benchmark domain that uses anytime weighted A* to solve a range of heuristic search problems and a mobile robot application that uses RRT* to solve motion planning problems. |
Shuwa Miura; Kyle Hollins Wray; Shlomo Zilberstein Heuristic Search for SSPs with Lexicographic Preferences over Multiple Costs Conference Proceedings of the 15th Annual Symposium on Combinatorial Search (SOCS), Vienna, Austria, 2022. @conference{SZ:MWZsocs22,
title = {Heuristic Search for SSPs with Lexicographic Preferences over Multiple Costs},
author = {Shuwa Miura and Kyle Hollins Wray and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/MWZsocs22.pdf},
year = {2022},
date = {2022-01-01},
booktitle = {Proceedings of the 15th Annual Symposium on Combinatorial Search (SOCS)},
address = {Vienna, Austria},
abstract = {Real-world decision problems often involve multiple competing objectives. The Stochastic Shortest Path (SSP) with lexicographic preferences over multiple costs offers an expressive formulation for many practical problems. However, the existing solution methods either lack optimality guarantees or require costly computations over the entire state space. We propose the first heuristic search algorithm for this problem, based on the heuristic algorithm for Constrained SSPs. Our experiments show that our heuristic search algorithm can compute optimal policies while avoiding a large portion of the state space. We also analyze the theoretical properties of the problem, establishing the conditions under which SSPs with lexicographic preferences have a proper optimal policy.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Real-world decision problems often involve multiple competing objectives. The Stochastic Shortest Path (SSP) with lexicographic preferences over multiple costs offers an expressive formulation for many practical problems. However, the existing solution methods either lack optimality guarantees or require costly computations over the entire state space. We propose the first heuristic search algorithm for this problem, based on the heuristic algorithm for Constrained SSPs. Our experiments show that our heuristic search algorithm can compute optimal policies while avoiding a large portion of the state space. We also analyze the theoretical properties of the problem, establishing the conditions under which SSPs with lexicographic preferences have a proper optimal policy. |
John R. Peterson; Anagha Kulkarni; Emil Keyder; Joseph Kim; Shlomo Zilberstein Trajectory Constraint Heuristics for Optimal Probabilistic Planning Conference Proceedings of the 15th Annual Symposium on Combinatorial Search (SOCS), Vienna, Austria, 2022. @conference{SZ:PKKKZsocs22,
title = {Trajectory Constraint Heuristics for Optimal Probabilistic Planning},
author = {John R. Peterson and Anagha Kulkarni and Emil Keyder and Joseph Kim and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/PKKKZsocs22.pdf},
year = {2022},
date = {2022-01-01},
booktitle = {Proceedings of the 15th Annual Symposium on Combinatorial Search (SOCS)},
address = {Vienna, Austria},
abstract = {Search algorithms such as LAO* and LRTDP coupled with admissible heuristics are widely used methods for optimal probabilistic planning. Their effectiveness depends on the degree to which heuristics are able to approximate the optimal cost of a state. Many common domain-independent heuristics, however, rely on determinization, and ignore the probabilities associated with different effects of actions. Here, we present a method for decomposing a probabilistic planning problem into subproblems by constraining possible action outcomes. Admissible heuristics evaluated for each subproblem can then be combined via a weighted sum to obtain an admissible heuristic for the original problem that takes into account a limited amount of probabilistic information. We use this approach to derive new admissible heuristics for probabilistic planning, and show that for some problems they are significantly more informative than existing heuristics, giving up to an order of magnitude speedup in the time to converge to an optimal policy.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Search algorithms such as LAO* and LRTDP coupled with admissible heuristics are widely used methods for optimal probabilistic planning. Their effectiveness depends on the degree to which heuristics are able to approximate the optimal cost of a state. Many common domain-independent heuristics, however, rely on determinization, and ignore the probabilities associated with different effects of actions. Here, we present a method for decomposing a probabilistic planning problem into subproblems by constraining possible action outcomes. Admissible heuristics evaluated for each subproblem can then be combined via a weighted sum to obtain an admissible heuristic for the original problem that takes into account a limited amount of probabilistic information. We use this approach to derive new admissible heuristics for probabilistic planning, and show that for some problems they are significantly more informative than existing heuristics, giving up to an order of magnitude speedup in the time to converge to an optimal policy. |
Connor Basich; Joseph A. Russino; Steve Chien; Shlomo Zilberstein A Sampling Based Approach to Robust Planning for a Planetary Lander Conference Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan, 2022. @conference{SZ:BRCZiros22,
title = {A Sampling Based Approach to Robust Planning for a Planetary Lander},
author = {Connor Basich and Joseph A. Russino and Steve Chien and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/BRCZiros22.pdf},
year = {2022},
date = {2022-01-01},
booktitle = {Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
pages = {4106--4111},
address = {Kyoto, Japan},
abstract = {Planning for autonomous operation in unknown environments poses a number of technical challenges. The agent must ensure robustness to unknown phenomena, un- predictable variation in execution, and uncertain resources, all while maximizing its objective. These challenges are ex- acerbated in the context of space missions where uncertainty is often higher, long communication delays necessitate robust autonomous execution, and severely constrained computational resources limit the scope of planning techniques that can be used. We examine this problem in the context of a Europa Lander concept mission where an autonomous lander must collect valuable data and communicate that data back to Earth. We model the problem as a hierarchical task network, framing it as a utility maximization problem constrained by a strictly monotonically decreasing energy resource. We propose a novel deterministic planning framework that uses periodic replanning and sampling-based optimization to better handle model uncertainty and execution variation, while remaining computationally tractable. We demonstrate the efficacy of our framework through simulations of a Europa Lander concept mission in which our approach outperforms several baselines in utility maximization and robustness.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Planning for autonomous operation in unknown environments poses a number of technical challenges. The agent must ensure robustness to unknown phenomena, un- predictable variation in execution, and uncertain resources, all while maximizing its objective. These challenges are ex- acerbated in the context of space missions where uncertainty is often higher, long communication delays necessitate robust autonomous execution, and severely constrained computational resources limit the scope of planning techniques that can be used. We examine this problem in the context of a Europa Lander concept mission where an autonomous lander must collect valuable data and communicate that data back to Earth. We model the problem as a hierarchical task network, framing it as a utility maximization problem constrained by a strictly monotonically decreasing energy resource. We propose a novel deterministic planning framework that uses periodic replanning and sampling-based optimization to better handle model uncertainty and execution variation, while remaining computationally tractable. We demonstrate the efficacy of our framework through simulations of a Europa Lander concept mission in which our approach outperforms several baselines in utility maximization and robustness. |
Connor Basich; John Peterson; Shlomo Zilberstein Planning with Intermittent State Observability: Knowing When to Act Blind Conference Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan, 2022. @conference{SZ:BPZiros22,
title = {Planning with Intermittent State Observability: Knowing When to Act Blind},
author = {Connor Basich and John Peterson and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/BPZiros22.pdf},
year = {2022},
date = {2022-01-01},
booktitle = {Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
pages = {11657--11664},
address = {Kyoto, Japan},
abstract = {Contemporary planning models and methods often rely on constant availability of free state information at each step of execution. However, autonomous systems are increasingly deployed in the open world where state information may be costly or simply unavailable in certain situations. Failing to account for sensor limitations may lead to costly behavior or even catastrophic failure. While the partially observable Markov decision process (POMDP) can be used to model this problem, solving POMDPs is often intractable. We introduce a planning model called a semi-observable Markov decision process (SOMDP) specifically designed for MDPs where state observability may be intermittent. We propose an approach for solving SOMDPs that uses memory states to proactively plan for the potential loss of sensor information while exploiting the unique structure of SOMDPs. Our theoretical analysis and empirical evaluation demonstrate the advantages of SOMDPs relative to existing planning models.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Contemporary planning models and methods often rely on constant availability of free state information at each step of execution. However, autonomous systems are increasingly deployed in the open world where state information may be costly or simply unavailable in certain situations. Failing to account for sensor limitations may lead to costly behavior or even catastrophic failure. While the partially observable Markov decision process (POMDP) can be used to model this problem, solving POMDPs is often intractable. We introduce a planning model called a semi-observable Markov decision process (SOMDP) specifically designed for MDPs where state observability may be intermittent. We propose an approach for solving SOMDPs that uses memory states to proactively plan for the potential loss of sensor information while exploiting the unique structure of SOMDPs. Our theoretical analysis and empirical evaluation demonstrate the advantages of SOMDPs relative to existing planning models. |
Samer B. Nashed; Justin Svegliato; Abhinav Bhatia; Stuart Russell; Shlomo Zilberstein Selecting the Partial State Abstractions of MDPs: A Metareasoning Approach with Deep Reinforcement Learning Conference Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan, 2022. @conference{SZ:NSBRZiros22,
title = {Selecting the Partial State Abstractions of MDPs: A Metareasoning Approach with Deep Reinforcement Learning},
author = {Samer B. Nashed and Justin Svegliato and Abhinav Bhatia and Stuart Russell and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/NSBRZiros22.pdf},
year = {2022},
date = {2022-01-01},
booktitle = {Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
pages = {116665--11670},
address = {Kyoto, Japan},
abstract = {Markov decision processes (MDPs) are a common general-purpose model used in robotics for representing sequential decision-making problems. Given the complexity of robotics applications, a popular approach for approximately solving MDPs relies on state aggregation to reduce the size of the state space but at the expense of policy fidelity--offering a trade-off between policy quality and computation time. Naturally, this poses a challenging metareasoning problem: how can an autonomous system dynamically select different state abstractions that optimize this trade-off as it operates online? In this paper, we formalize this metareasoning problem with a notion of time-dependent utility and solve it using deep reinforcement learning. To do this, we develop several general, cheap heuristics that summarize the reward structure and transition topology of the MDP at hand to serve as effective features. Empirically, we demonstrate that our metareasoning approach outperforms several baseline approaches and a strong heuristic approach on a standard benchmark domain.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Markov decision processes (MDPs) are a common general-purpose model used in robotics for representing sequential decision-making problems. Given the complexity of robotics applications, a popular approach for approximately solving MDPs relies on state aggregation to reduce the size of the state space but at the expense of policy fidelity--offering a trade-off between policy quality and computation time. Naturally, this poses a challenging metareasoning problem: how can an autonomous system dynamically select different state abstractions that optimize this trade-off as it operates online? In this paper, we formalize this metareasoning problem with a notion of time-dependent utility and solve it using deep reinforcement learning. To do this, we develop several general, cheap heuristics that summarize the reward structure and transition topology of the MDP at hand to serve as effective features. Empirically, we demonstrate that our metareasoning approach outperforms several baseline approaches and a strong heuristic approach on a standard benchmark domain. |
Christopher Ostafew; Astha Vagadia; Najamuddin Baig; Viju James; Stefan Witwicki; Shlomo Zilberstein Exception Situation Playback for Tele-Operators Miscellaneous 2022, (US Patent 11,215,987). @misc{SZ:OVBJWZpatent22a,
title = {Exception Situation Playback for Tele-Operators},
author = {Christopher Ostafew and Astha Vagadia and Najamuddin Baig and Viju James and Stefan Witwicki and Shlomo Zilberstein},
url = {https://patents.google.com/patent/US11215987B2/en},
year = {2022},
date = {2022-01-01},
publisher = {Google Patents},
abstract = {Resolving an exception situation in autonomous driving includes receiving an assistance request to resolve the exception situation from an autonomous vehicle (AV); identifying a solution to the exception situation; forwarding the solution to a tele-operator; receiving a request for playback data from the tele-operator; receiving, from the AV, the playback data; and obtaining, from the tele-operator, a validated solution based on the tele-operator using the playback data. The playback data includes snapshots n_i of data related to autonomous driving stored at the AV at respective consecutive times t_i, for i= 1,... , N.},
note = {US Patent 11,215,987},
keywords = {},
pubstate = {published},
tppubtype = {misc}
}
Resolving an exception situation in autonomous driving includes receiving an assistance request to resolve the exception situation from an autonomous vehicle (AV); identifying a solution to the exception situation; forwarding the solution to a tele-operator; receiving a request for playback data from the tele-operator; receiving, from the AV, the playback data; and obtaining, from the tele-operator, a validated solution based on the tele-operator using the playback data. The playback data includes snapshots n_i of data related to autonomous driving stored at the AV at respective consecutive times t_i, for i= 1,... , N. |
2021
|
Kyle Hollins Wray; Stefan J Witwicki; Shlomo Zilberstein; Omar Bentahar; Arec Jamgochian Explainability of Autonomous Vehicle Decision Making Miscellaneous 2021, (US Patent App. 16/778,890). @misc{SZ:BWWZpatent21j,
title = {Explainability of Autonomous Vehicle Decision Making},
author = {Kyle Hollins Wray and Stefan J Witwicki and Shlomo Zilberstein and Omar Bentahar and Arec Jamgochian},
url = {https://patents.google.com/patent/US20210240190A1/en},
year = {2021},
date = {2021-08-01},
publisher = {Google Patents},
abstract = {A processor is configured to execute instructions stored in a memory to identify distinct vehicle operational scenarios; instantiate decision components, where each of the decision components is an instance of a respective decision problem, and where the each of the decision components maintains a respective state describing the respective vehicle operational scenario; receive respective candidate vehicle control actions from the decision components; select an action from the respective candidate vehicle control actions, where the action is from a selected decision component of the decision components, and where the action is used to control the AV to traverse a portion of the vehicle transportation network; and generate an explanation as to why the action was selected, where the explanation includes respective descriptors of the action, the selected decision component, and a state factor of the respective state of the selected decision component.},
note = {US Patent App. 16/778,890},
keywords = {},
pubstate = {published},
tppubtype = {misc}
}
A processor is configured to execute instructions stored in a memory to identify distinct vehicle operational scenarios; instantiate decision components, where each of the decision components is an instance of a respective decision problem, and where the each of the decision components maintains a respective state describing the respective vehicle operational scenario; receive respective candidate vehicle control actions from the decision components; select an action from the respective candidate vehicle control actions, where the action is from a selected decision component of the decision components, and where the action is used to control the AV to traverse a portion of the vehicle transportation network; and generate an explanation as to why the action was selected, where the explanation includes respective descriptors of the action, the selected decision component, and a state factor of the respective state of the selected decision component. |
Kyle Hollins Wray; Stefan J Witwicki; Shlomo Zilberstein Reinforcement and Model Learning for Vehicle Operation Miscellaneous 2021, (US Patent 11,027,751). @misc{SZ:BWWZpatent21f,
title = {Reinforcement and Model Learning for Vehicle Operation},
author = {Kyle Hollins Wray and Stefan J Witwicki and Shlomo Zilberstein},
url = {https://patents.google.com/patent/US11027751B2/en},
year = {2021},
date = {2021-06-01},
publisher = {Google Patents},
abstract = {Methods and vehicles may be configured to gain experience in the form of state-action and/or action-observation histories for an operational scenario as the vehicle traverses a vehicle transportation network. The histories may be incorporated into a model in the form of learning to improve the model over time. The learning may be used to improve integration with human behavior. Driver feedback may be used in the learning examples to improve future performance and to integrate with human behavior. The learning may be used to create customized scenario solutions. The learning may be used to transfer a learned solution and apply the learned solution to a similar scenario.},
note = {US Patent 11,027,751},
keywords = {},
pubstate = {published},
tppubtype = {misc}
}
Methods and vehicles may be configured to gain experience in the form of state-action and/or action-observation histories for an operational scenario as the vehicle traverses a vehicle transportation network. The histories may be incorporated into a model in the form of learning to improve the model over time. The learning may be used to improve integration with human behavior. Driver feedback may be used in the learning examples to improve future performance and to integrate with human behavior. The learning may be used to create customized scenario solutions. The learning may be used to transfer a learned solution and apply the learned solution to a similar scenario. |
Kyle Hollins Wray; Stefan J Witwicki; Shlomo Zilberstein Risk Aware Executor with Action Set Recommendations Miscellaneous 2021, (US Patent App. 16/696,235). @misc{SZ:BWWZpatent21d,
title = {Risk Aware Executor with Action Set Recommendations},
author = {Kyle Hollins Wray and Stefan J Witwicki and Shlomo Zilberstein},
url = {https://patents.google.com/patent/US20210157315A1/en},
year = {2021},
date = {2021-05-01},
publisher = {Google Patents},
abstract = {A method for use in traversing a vehicle transportation network by an autonomous vehicle (AV) includes traversing, by the AV, the vehicle transportation network. Traversing the vehicle transportation network includes identifying a distinct vehicle operational scenario; instantiating a first decision component instance; receiving a first set of candidate vehicle control actions from the first decision component instance; selecting an action; and controlling the AV to traverse a portion of the vehicle transportation network based on the action. The first decision component instance is an instance of a first decision component modeling the distinct vehicle operational scenario.},
note = {US Patent App. 16/696,235},
keywords = {},
pubstate = {published},
tppubtype = {misc}
}
A method for use in traversing a vehicle transportation network by an autonomous vehicle (AV) includes traversing, by the AV, the vehicle transportation network. Traversing the vehicle transportation network includes identifying a distinct vehicle operational scenario; instantiating a first decision component instance; receiving a first set of candidate vehicle control actions from the first decision component instance; selecting an action; and controlling the AV to traverse a portion of the vehicle transportation network based on the action. The first decision component instance is an instance of a first decision component modeling the distinct vehicle operational scenario. |
Justin Svegliato; Samer B Nashed; Shlomo Zilberstein Ethically Compliant Sequential Decision Making Conference Proceedings of the 35th Conference on Artificial Intelligence (AAAI), 2021, (Distinguished Paper Award). @conference{SZ:SNZaaai21,
title = {Ethically Compliant Sequential Decision Making},
author = {Justin Svegliato and Samer B Nashed and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/SNZaaai21.pdf},
year = {2021},
date = {2021-01-01},
booktitle = {Proceedings of the 35th Conference on Artificial Intelligence (AAAI)},
pages = {11657--11665},
abstract = {Enabling autonomous systems to comply with an ethical theory is critical given their accelerating deployment in domains that impact society. While many ethical theories have been studied extensively in moral philosophy, they are still challenging to implement by developers who build autonomous systems. This paper proposes a novel approach for building ethically compliant autonomous systems that optimize completing a task while following an ethical framework. First, we introduce a definition of an ethically compliant autonomous system and its properties. Next, we offer a range of ethical frameworks for divine command theory, prima facie duties, and virtue ethics. Finally, we demonstrate the accuracy and usability of our approach in a set of autonomous driving simulations and a user study of planning and robotics experts.},
note = {Distinguished Paper Award},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Enabling autonomous systems to comply with an ethical theory is critical given their accelerating deployment in domains that impact society. While many ethical theories have been studied extensively in moral philosophy, they are still challenging to implement by developers who build autonomous systems. This paper proposes a novel approach for building ethically compliant autonomous systems that optimize completing a task while following an ethical framework. First, we introduce a definition of an ethically compliant autonomous system and its properties. Next, we offer a range of ethical frameworks for divine command theory, prima facie duties, and virtue ethics. Finally, we demonstrate the accuracy and usability of our approach in a set of autonomous driving simulations and a user study of planning and robotics experts. |
Abhinav Bhatia; Justin Svegliato; Shlomo Zilberstein On the Benefits of Randomly Adjusting Anytime Weighted A* Conference Proceedings of the 14th International Symposium on Combinatorial Search (SOCS), 2021. @conference{SZ:BSZsocs21,
title = {On the Benefits of Randomly Adjusting Anytime Weighted A*},
author = {Abhinav Bhatia and Justin Svegliato and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/BSZsocs21.pdf},
year = {2021},
date = {2021-01-01},
booktitle = {Proceedings of the 14th International Symposium on Combinatorial Search (SOCS)},
abstract = {Anytime Weighted A*--an anytime heuristic search algorithm that uses a weight to scale the heuristic value of each node in the open list--has proven to be an effective way to manage the trade-off between solution quality and computation time in heuristic search. Finding the best weight, however, is challenging because it depends on not only the characteristics of the domain and the details of the instance at hand, but also the available computation time. We propose a randomized version of this algorithm, called Randomized Weighted A*, that randomly adjusts its weight at runtime and show a counterintuitive phenomenon: RWA* generally per- forms as well or better than AWA* with the best static weight on a range of benchmark problems. The result is a simple algorithm that is easy to implement and performs consistently well without any offline experimentation or parameter tuning.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Anytime Weighted A*--an anytime heuristic search algorithm that uses a weight to scale the heuristic value of each node in the open list--has proven to be an effective way to manage the trade-off between solution quality and computation time in heuristic search. Finding the best weight, however, is challenging because it depends on not only the characteristics of the domain and the details of the instance at hand, but also the available computation time. We propose a randomized version of this algorithm, called Randomized Weighted A*, that randomly adjusts its weight at runtime and show a counterintuitive phenomenon: RWA* generally per- forms as well or better than AWA* with the best static weight on a range of benchmark problems. The result is a simple algorithm that is easy to implement and performs consistently well without any offline experimentation or parameter tuning. |
Samer B Nashed; Justin Svegliato; Matteo Brucato; Connor Basich; Roderic A Grupen; Shlomo Zilberstein Solving Markov Decision Processes with Partial State Abstractions Conference Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2021. @conference{SZ:NSBBGZicra21,
title = {Solving Markov Decision Processes with Partial State Abstractions},
author = {Samer B Nashed and Justin Svegliato and Matteo Brucato and Connor Basich and Roderic A Grupen and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/NSBBGZicra21.pdf},
year = {2021},
date = {2021-01-01},
booktitle = {Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)},
abstract = {Autonomous systems often use approximate planners that exploit state abstractions to solve large MDPs in real-time decision-making problems. However, these planners can eliminate details needed to produce effective behavior in autonomous systems. We therefore propose a novel model, a partially abstract MDP, with a set of abstract states that each compress a set of ground states to condense irrelevant details and a set of ground states that expand from a set of grounded abstract states to retain relevant details. This paper offers (1) a definition of a partially abstract MDP that (2) generalizes its ground MDP and its abstract MDP and exhibits bounded optimality depending on its abstract MDP along with (3) a lazy algorithm for planning and execution in autonomous systems. The result is a scalable approach that computes near-optimal solutions to large problems in minutes rather than hours.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Autonomous systems often use approximate planners that exploit state abstractions to solve large MDPs in real-time decision-making problems. However, these planners can eliminate details needed to produce effective behavior in autonomous systems. We therefore propose a novel model, a partially abstract MDP, with a set of abstract states that each compress a set of ground states to condense irrelevant details and a set of ground states that expand from a set of grounded abstract states to retain relevant details. This paper offers (1) a definition of a partially abstract MDP that (2) generalizes its ground MDP and its abstract MDP and exhibits bounded optimality depending on its abstract MDP along with (3) a lazy algorithm for planning and execution in autonomous systems. The result is a scalable approach that computes near-optimal solutions to large problems in minutes rather than hours. |
Sainyam Galhotra; Sandhya Saisubramanian; Shlomo Zilberstein Learning to Generate Fair Clusters from Demonstrations Conference Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (AIES), 2021. @conference{SZ:GSZaies21,
title = {Learning to Generate Fair Clusters from Demonstrations},
author = {Sainyam Galhotra and Sandhya Saisubramanian and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/GSZaies21.pdf},
year = {2021},
date = {2021-01-01},
booktitle = {Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (AIES)},
abstract = {Fair clustering is the process of grouping similar entities together, while satisfying a mathematically well-defined fairness metric as a constraint. Due to the practical challenges in precise model specification, the prescribed fairness constraints are often incomplete and act as proxies to the intended fairness requirement. Clustering with proxies may lead to biased outcomes when the system is deployed. We examine how to identify the intended fairness constraint for a problem based on limited demonstrations from an expert. Each demonstration is a clustering over a subset of the data. We present an algorithm to identify the fairness metric from demonstrations and generate clusters using existing off-the-shelf clustering techniques, and analyze its theoretical properties. To extend our approach to novel fairness metrics for which clustering algorithms do not currently exist, we present a greedy method for clustering. Additionally, we investigate how to generate interpretable solutions using our approach. Empirical evaluation on three real-world datasets demonstrates the effectiveness of our approach in quickly identifying the underlying fairness and interpretability constraints, which are then used to generate fair and interpretable clusters.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Fair clustering is the process of grouping similar entities together, while satisfying a mathematically well-defined fairness metric as a constraint. Due to the practical challenges in precise model specification, the prescribed fairness constraints are often incomplete and act as proxies to the intended fairness requirement. Clustering with proxies may lead to biased outcomes when the system is deployed. We examine how to identify the intended fairness constraint for a problem based on limited demonstrations from an expert. Each demonstration is a clustering over a subset of the data. We present an algorithm to identify the fairness metric from demonstrations and generate clusters using existing off-the-shelf clustering techniques, and analyze its theoretical properties. To extend our approach to novel fairness metrics for which clustering algorithms do not currently exist, we present a greedy method for clustering. Additionally, we investigate how to generate interpretable solutions using our approach. Empirical evaluation on three real-world datasets demonstrates the effectiveness of our approach in quickly identifying the underlying fairness and interpretability constraints, which are then used to generate fair and interpretable clusters. |
Samer B Nashed; Justin Svegliato; Shlomo Zilberstein Ethically Compliant Planning within Moral Communities Conference Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (AIES), 2021. @conference{SZ:NSZaies21,
title = {Ethically Compliant Planning within Moral Communities},
author = {Samer B Nashed and Justin Svegliato and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/NSZaies21.pdf},
year = {2021},
date = {2021-01-01},
booktitle = {Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (AIES)},
abstract = {Ethically compliant autonomous systems (ECAS) are the state-of- the-art for solving sequential decision-making problems under un- certainty while respecting constraints that encode ethical considerations. This paper defines a novel concept in the context of ECAS that is from moral philosophy, the moral community, which leads to a nuanced taxonomy of explicit ethical agents. We then propose new ethical frameworks that extend the applicability of ECAS to domains where a moral community is required. Next, we provide a formal analysis of the proposed ethical frameworks and conduct experiments that illustrate their differences. Finally, we discuss the implications of explicit moral communities that could shape research on standards and guidelines for ethical agents in order to better understand and predict common errors in their design and communicate their capabilities.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Ethically compliant autonomous systems (ECAS) are the state-of- the-art for solving sequential decision-making problems under un- certainty while respecting constraints that encode ethical considerations. This paper defines a novel concept in the context of ECAS that is from moral philosophy, the moral community, which leads to a nuanced taxonomy of explicit ethical agents. We then propose new ethical frameworks that extend the applicability of ECAS to domains where a moral community is required. Next, we provide a formal analysis of the proposed ethical frameworks and conduct experiments that illustrate their differences. Finally, we discuss the implications of explicit moral communities that could shape research on standards and guidelines for ethical agents in order to better understand and predict common errors in their design and communicate their capabilities. |
Sandhya Saisubramanian; Shannon C Roberts; Shlomo Zilberstein Understanding User Attitudes Towards Negative Side Effects of AI Systems Conference CHI Conference on Human Factors in Computing Systems, Late-Breaking Work, 2021. @conference{SZ:SRZchi21,
title = {Understanding User Attitudes Towards Negative Side Effects of AI Systems},
author = {Sandhya Saisubramanian and Shannon C Roberts and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/SRZchi21.pdf},
year = {2021},
date = {2021-01-01},
booktitle = {CHI Conference on Human Factors in Computing Systems, Late-Breaking Work},
pages = {368:1--368:6},
abstract = {Artificial Intelligence (AI) systems deployed in the open world may produce negative side effects—which are unanticipated, undesirable outcomes that occur in addition to the intended outcomes of the system’s actions. These negative side effects affect users directly or indirectly, by violating their preferences or altering their environment in an undesirable, potentially harmful, manner. While the existing literature has started to explore techniques to overcome the impacts of negative side effects in deployed systems, there has been no prior efforts to determine how users perceive and respond to negative side effects. We surveyed 183 participants to develop an understanding of user attitudes towards side effects and how side effects impact user trust in the system. The surveys targeted two domains: an autonomous vacuum cleaner and an autonomous vehicle, each with 183 respondents. The results indicate that users are willing to tolerate side effects that are not safety-critical but prefer to minimize them as much as possible. Furthermore, users are willing to assist the system in mitigating negative side effects by providing feedback and reconfiguring the environment. Trust in the system diminishes if it fails to minimize the impacts of negative side effects over time. These results support key fundamental assumptions in existing techniques and facilitate the development of new methods to overcome negative side effects of AI systems.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Artificial Intelligence (AI) systems deployed in the open world may produce negative side effects—which are unanticipated, undesirable outcomes that occur in addition to the intended outcomes of the system’s actions. These negative side effects affect users directly or indirectly, by violating their preferences or altering their environment in an undesirable, potentially harmful, manner. While the existing literature has started to explore techniques to overcome the impacts of negative side effects in deployed systems, there has been no prior efforts to determine how users perceive and respond to negative side effects. We surveyed 183 participants to develop an understanding of user attitudes towards side effects and how side effects impact user trust in the system. The surveys targeted two domains: an autonomous vacuum cleaner and an autonomous vehicle, each with 183 respondents. The results indicate that users are willing to tolerate side effects that are not safety-critical but prefer to minimize them as much as possible. Furthermore, users are willing to assist the system in mitigating negative side effects by providing feedback and reconfiguring the environment. Trust in the system diminishes if it fails to minimize the impacts of negative side effects over time. These results support key fundamental assumptions in existing techniques and facilitate the development of new methods to overcome negative side effects of AI systems. |
Sandhya Saisubramanian; Shlomo Zilberstein Mitigating Negative Side Effects via Environment Shaping (Extended Abstract) Conference Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), 2021. @conference{SZ:SZaamas21,
title = {Mitigating Negative Side Effects via Environment Shaping (Extended Abstract)},
author = {Sandhya Saisubramanian and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/SZaamas21.pdf},
year = {2021},
date = {2021-01-01},
booktitle = {Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS)},
abstract = {Agents operating in the open world often produce negative side effects (NSE), which are difficult to identify at design time. We examine how a human can assist an agent, beyond providing feedback, and exploit their broader scope of knowledge to mitigate the impacts of NSE. We formulate this problem as a human-agent team with decoupled objectives. The agent optimizes its assigned task, during which its actions may produce NSE. The human shapes the environment through minor reconfiguration actions so as to mitigate the impacts of agent's side effects, without significantly degrading agent performance. We present an algorithm to solve this problem. Empirical evaluation shows that the proposed framework can successfully mitigate NSE, without affecting the agent’s ability to complete its assigned task.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Agents operating in the open world often produce negative side effects (NSE), which are difficult to identify at design time. We examine how a human can assist an agent, beyond providing feedback, and exploit their broader scope of knowledge to mitigate the impacts of NSE. We formulate this problem as a human-agent team with decoupled objectives. The agent optimizes its assigned task, during which its actions may produce NSE. The human shapes the environment through minor reconfiguration actions so as to mitigate the impacts of agent's side effects, without significantly degrading agent performance. We present an algorithm to solve this problem. Empirical evaluation shows that the proposed framework can successfully mitigate NSE, without affecting the agent’s ability to complete its assigned task. |
Sandhya Saisubramanian; Shlomo Zilberstein Mitigating Negative Side Effects via Environment Shaping Journal Article In: CoRR, vol. abs/2102.07017, 2021. @article{SZ:SZarXiv21b,
title = {Mitigating Negative Side Effects via Environment Shaping},
author = {Sandhya Saisubramanian and Shlomo Zilberstein},
url = {https://arxiv.org/abs/2102.07017},
year = {2021},
date = {2021-01-01},
journal = {CoRR},
volume = {abs/2102.07017},
abstract = {Agents operating in unstructured environments often produce negative side effects (NSE), which are difficult to identify at design time. While the agent can learn to mitigate the side effects from human feedback, such feedback is often expensive and the rate of learning is sensitive to the agent's state representation. We examine how humans can assist an agent, beyond providing feedback, and exploit their broader scope of knowledge to mitigate the impacts of NSE. We formulate this problem as a human-agent team with decoupled objectives. The agent optimizes its assigned task, during which its actions may produce NSE. The human shapes the environment through minor reconfiguration actions so as to mitigate the impacts of the agent's side effects, without affecting the agent's ability to complete its assigned task. We present an algorithm to solve this problem and analyze its theoretical properties. Through experiments with human subjects, we assess the willingness of users to perform minor environment modifications to mitigate the impacts of NSE. Empirical evaluation of our approach shows that the proposed framework can successfully mitigate NSE, without affecting the agent's ability to complete its assigned task.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Agents operating in unstructured environments often produce negative side effects (NSE), which are difficult to identify at design time. While the agent can learn to mitigate the side effects from human feedback, such feedback is often expensive and the rate of learning is sensitive to the agent's state representation. We examine how humans can assist an agent, beyond providing feedback, and exploit their broader scope of knowledge to mitigate the impacts of NSE. We formulate this problem as a human-agent team with decoupled objectives. The agent optimizes its assigned task, during which its actions may produce NSE. The human shapes the environment through minor reconfiguration actions so as to mitigate the impacts of the agent's side effects, without affecting the agent's ability to complete its assigned task. We present an algorithm to solve this problem and analyze its theoretical properties. Through experiments with human subjects, we assess the willingness of users to perform minor environment modifications to mitigate the impacts of NSE. Empirical evaluation of our approach shows that the proposed framework can successfully mitigate NSE, without affecting the agent's ability to complete its assigned task. |
Sainyam Galhotra; Sandhya Saisubramanian; Shlomo Zilberstein Learning to Generate Fair Clusters from Demonstrations Journal Article In: CoRR, vol. abs/2102.03977, 2021. @article{SZ:GSZarXiv21a,
title = {Learning to Generate Fair Clusters from Demonstrations},
author = {Sainyam Galhotra and Sandhya Saisubramanian and Shlomo Zilberstein},
url = {https://arxiv.org/abs/2102.03977},
year = {2021},
date = {2021-01-01},
journal = {CoRR},
volume = {abs/2102.03977},
abstract = {Fair clustering is the process of grouping similar entities together, while satisfying a mathematically well-defined fairness metric as a constraint. Due to the practical challenges in precise model specification, the prescribed fairness constraints are often incomplete and act as proxies to the intended fairness requirement, leading to biased outcomes when the system is deployed. We examine how to identify the intended fairness constraint for a problem based on limited demonstrations from an expert. Each demonstration is a clustering over a subset of the data. We present an algorithm to identify the fairness metric from demonstrations and generate clusters using existing off-the-shelf clustering techniques, and analyze its theoretical properties. To extend our approach to novel fairness metrics for which clustering algorithms do not currently exist, we present a greedy method for clustering. Additionally, we investigate how to generate interpretable solutions using our approach. Empirical evaluation on three real-world datasets demonstrates the effectiveness of our approach in quickly identifying the underlying fairness and interpretability constraints, which are then used to generate fair and interpretable clusters.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Fair clustering is the process of grouping similar entities together, while satisfying a mathematically well-defined fairness metric as a constraint. Due to the practical challenges in precise model specification, the prescribed fairness constraints are often incomplete and act as proxies to the intended fairness requirement, leading to biased outcomes when the system is deployed. We examine how to identify the intended fairness constraint for a problem based on limited demonstrations from an expert. Each demonstration is a clustering over a subset of the data. We present an algorithm to identify the fairness metric from demonstrations and generate clusters using existing off-the-shelf clustering techniques, and analyze its theoretical properties. To extend our approach to novel fairness metrics for which clustering algorithms do not currently exist, we present a greedy method for clustering. Additionally, we investigate how to generate interpretable solutions using our approach. Empirical evaluation on three real-world datasets demonstrates the effectiveness of our approach in quickly identifying the underlying fairness and interpretability constraints, which are then used to generate fair and interpretable clusters. |
Connor Basich; Kyle Hollins Wray; Stefan J Witwicki; Shlomo Zilberstein Introspective Competence Modeling for AV Decision Making Miscellaneous 2021, (US Patent App. 16/668,584). @misc{SZ:BWWZpatent21c,
title = {Introspective Competence Modeling for AV Decision Making},
author = {Connor Basich and Kyle Hollins Wray and Stefan J Witwicki and Shlomo Zilberstein},
url = {https://patents.google.com/patent/US20210132606A1/en},
year = {2021},
date = {2021-01-01},
publisher = {Google Patents},
abstract = {A first method includes detecting, based on sensor data, an environment state; selecting an action based on the environment state; determining an autonomy level associated with the environment state and the action; and performing the action according to the autonomy level. The autonomy level can be selected based at least on an autonomy model and a feedback model. A second method includes calculating, by solving an extended Stochastic Shortest Path (SSP) problem, a policy for solving a task. The policy can map environment states and autonomy levels to actions and autonomy levels. Calculating the policy can include generating plans that operate across multiple levels of autonomy.},
note = {US Patent App. 16/668,584},
keywords = {},
pubstate = {published},
tppubtype = {misc}
}
A first method includes detecting, based on sensor data, an environment state; selecting an action based on the environment state; determining an autonomy level associated with the environment state and the action; and performing the action according to the autonomy level. The autonomy level can be selected based at least on an autonomy model and a feedback model. A second method includes calculating, by solving an extended Stochastic Shortest Path (SSP) problem, a policy for solving a task. The policy can map environment states and autonomy levels to actions and autonomy levels. Calculating the policy can include generating plans that operate across multiple levels of autonomy. |
Connor Basich; Justin Svegliato; Allyson Beach; Kyle Hollins Wray; Stefan J Witwicki; Shlomo Zilberstein Improving Competence via Iterative State Space Refinement Conference Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 2021. @conference{SZ:BSBWWZiros21,
title = {Improving Competence via Iterative State Space Refinement},
author = {Connor Basich and Justin Svegliato and Allyson Beach and Kyle Hollins Wray and Stefan J Witwicki and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/BSBWWZiros21.pdf},
doi = {10.1109/IROS51168.2021.9636239},
year = {2021},
date = {2021-01-01},
booktitle = {Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
pages = {1865--1871},
address = {Prague, Czech Republic},
abstract = {Despite considerable efforts by human designers, accounting for every unique situation that an autonomous robotic system deployed in the real world could face is often an infeasible task. As a result, many such deployed systems still rely on human assistance in various capacities to complete certain tasks while staying safe. Competence-aware systems (CAS) is a recently proposed model for reducing such reliance on human assistance while in turn optimizing the system’s global autonomous operation by learning its own competence. However, such systems are limited by a fixed model of their environment and may perform poorly if their a priori planning model does not include certain features that emerge as important over the course of the system’s deployment. In this paper, we propose a method for improving the competence of a CAS over time by identifying important state features missing from the system’s model and incorporating them into its state representation, thereby refining its state space. Our approach exploits information that exists in the standard CAS model and adds no extra work to the human. The result is an agent that better predicts human involvement, improving its competence, reliability, and overall performance.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Despite considerable efforts by human designers, accounting for every unique situation that an autonomous robotic system deployed in the real world could face is often an infeasible task. As a result, many such deployed systems still rely on human assistance in various capacities to complete certain tasks while staying safe. Competence-aware systems (CAS) is a recently proposed model for reducing such reliance on human assistance while in turn optimizing the system’s global autonomous operation by learning its own competence. However, such systems are limited by a fixed model of their environment and may perform poorly if their a priori planning model does not include certain features that emerge as important over the course of the system’s deployment. In this paper, we propose a method for improving the competence of a CAS over time by identifying important state features missing from the system’s model and incorporating them into its state representation, thereby refining its state space. Our approach exploits information that exists in the standard CAS model and adds no extra work to the human. The result is an agent that better predicts human involvement, improving its competence, reliability, and overall performance. |
Shane Parr; Ishan Khatri; Justin Svegliato; Shlomo Zilberstein Agent-Aware State Estimation in Autonomous Vehicles Conference Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 2021. @conference{SZ:PKSZiros21,
title = {Agent-Aware State Estimation in Autonomous Vehicles},
author = {Shane Parr and Ishan Khatri and Justin Svegliato and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/PKSZiros21.pdf},
doi = {10.1109/IROS51168.2021.9636210},
year = {2021},
date = {2021-01-01},
booktitle = {Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
pages = {6694--6699},
address = {Prague, Czech Republic},
abstract = {Autonomous systems often operate in environments where the behavior of multiple agents is coordinated by a shared global state. Reliable estimation of the global state is thus critical for successfully operating in a multi-agent setting. We introduce agent-aware state estimation--a framework for calculating indirect estimations of state given observations of the behavior of other agents in the environment. We also introduce transition-independent agent-aware state estimation--a tractable class of agent-aware state estimation--and show that it allows the speed of inference to scale linearly with the number of agents in the environment. As an example, we model traffic light classification in instances of complete loss of direct observation. By taking into account observations of vehicular behavior from multiple directions of traffic, our approach exhibits accuracy higher than that of existing traffic light-only HMM methods on a real-world autonomous vehicle data set under a variety of simulated occlusion scenarios.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Autonomous systems often operate in environments where the behavior of multiple agents is coordinated by a shared global state. Reliable estimation of the global state is thus critical for successfully operating in a multi-agent setting. We introduce agent-aware state estimation--a framework for calculating indirect estimations of state given observations of the behavior of other agents in the environment. We also introduce transition-independent agent-aware state estimation--a tractable class of agent-aware state estimation--and show that it allows the speed of inference to scale linearly with the number of agents in the environment. As an example, we model traffic light classification in instances of complete loss of direct observation. By taking into account observations of vehicular behavior from multiple directions of traffic, our approach exhibits accuracy higher than that of existing traffic light-only HMM methods on a real-world autonomous vehicle data set under a variety of simulated occlusion scenarios. |
Connor Basich; Daniel Wang; Joseph Russino; Steve Chien; Shlomo Zilberstein A Sampling-Based Optimization Approach to Handling Environmental Uncertainty for a Planetary Lander Conference ICAPS Workshop on Planning and Robotics (PlanRob), Guangzhou, China, 2021. @conference{SZ:BWRCZicaps21ws1,
title = {A Sampling-Based Optimization Approach to Handling Environmental Uncertainty for a Planetary Lander},
author = {Connor Basich and Daniel Wang and Joseph Russino and Steve Chien and Shlomo Zilberstein},
year = {2021},
date = {2021-01-01},
booktitle = {ICAPS Workshop on Planning and Robotics (PlanRob)},
address = {Guangzhou, China},
abstract = {TBD.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
|
Shuwa Miura; Andrew L Cohen; Shlomo Zilberstein Maximizing Legibility in Stochastic Environments Conference Proceedings of the 30th IEEE International Conference on Robot & Human Interactive Communication, (RO-MAN), Vancouver, BC, Canada, 2021. @conference{SZ:MCZroman21,
title = {Maximizing Legibility in Stochastic Environments},
author = {Shuwa Miura and Andrew L Cohen and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/MCZroman21.pdf},
doi = {10.1109/RO-MAN50785.2021.9515318},
year = {2021},
date = {2021-01-01},
booktitle = {Proceedings of the 30th IEEE International Conference on Robot & Human Interactive Communication, (RO-MAN)},
pages = {1053--1059},
address = {Vancouver, BC, Canada},
abstract = {Making an agent's intentions clear from its observed behavior is crucial for seamless human-agent interaction and for increased transparency and trust in AI systems. Existing methods that address this challenge and maximize legibility of behaviors are limited to deterministic domains. We develop a technique for maximizing legibility in stochastic environments and illustrate that using legibility as an objective improves interpretability of agent behavior in several scenarios. We provide initial empirical evidence that human subjects can better interpret legible behavior.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Making an agent's intentions clear from its observed behavior is crucial for seamless human-agent interaction and for increased transparency and trust in AI systems. Existing methods that address this challenge and maximize legibility of behaviors are limited to deterministic domains. We develop a technique for maximizing legibility in stochastic environments and illustrate that using legibility as an objective improves interpretability of agent behavior in several scenarios. We provide initial empirical evidence that human subjects can better interpret legible behavior. |
Shuwa Miura; Shlomo Zilberstein A Unifying Framework for Observer-Aware Planning and its Complexity Conference Proceedings of the 37th Conference on Uncertainty in Artificial Intelligence (UAI), Virtual Event, 2021. @conference{SZ:MZuai21,
title = {A Unifying Framework for Observer-Aware Planning and its Complexity},
author = {Shuwa Miura and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/MZuai21.pdf},
year = {2021},
date = {2021-01-01},
booktitle = {Proceedings of the 37th Conference on Uncertainty in Artificial Intelligence (UAI)},
pages = {610--620},
address = {Virtual Event},
abstract = {Being aware of observers and the inferences they make about an agent's behavior is crucial for successful multi-agent interaction. Existing works on observer-aware planning use different assumptions and techniques to produce observer-aware behaviors. We argue that observer-aware planning, in its most general form, can be modeled as an Interactive POMDP (I-POMDP), which requires complex modeling and is hard to solve. Hence, we introduce a less complex framework for producing observer-aware behaviors called Observer-Aware MDP (OAMDP) and analyze its relationship to I-POMDP. We establish the complexity of OAMDPs and show that they can improve interpretability of agent behaviors in several scenarios.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Being aware of observers and the inferences they make about an agent's behavior is crucial for successful multi-agent interaction. Existing works on observer-aware planning use different assumptions and techniques to produce observer-aware behaviors. We argue that observer-aware planning, in its most general form, can be modeled as an Interactive POMDP (I-POMDP), which requires complex modeling and is hard to solve. Hence, we introduce a less complex framework for producing observer-aware behaviors called Observer-Aware MDP (OAMDP) and analyze its relationship to I-POMDP. We establish the complexity of OAMDPs and show that they can improve interpretability of agent behaviors in several scenarios. |
Sadegh Rabiee; Connor Basich; Kyle Hollins Wray; Shlomo Zilberstein; Joydeep Biswas Competence-Aware Path Planning via Introspective Perception Journal Article In: CoRR, vol. abs/2109.13974, 2021. @article{SZ:SZarXiv21c,
title = {Competence-Aware Path Planning via Introspective Perception},
author = {Sadegh Rabiee and Connor Basich and Kyle Hollins Wray and Shlomo Zilberstein and Joydeep Biswas},
url = {https://arxiv.org/abs/2109.13974},
year = {2021},
date = {2021-01-01},
journal = {CoRR},
volume = {abs/2109.13974},
abstract = {Robots deployed in the real world over extended periods of time need to reason about unexpected failures, learn to predict them, and to proactively take actions to avoid future failures. Existing approaches for competence-aware planning are either model-based, requiring explicit enumeration of known failure modes, or purely statistical, using state- and location-specific failure statistics to infer competence. We instead propose a structured model-free approach to competence-aware planning by reasoning about plan execution failures due to errors in perception, without requiring a-priori enumeration of failure modes or requiring location-specific failure statistics. We introduce competence-aware path planning via introspective perception (CPIP), a Bayesian framework to iteratively learn and exploit task-level competence in novel deployment environments. CPIP factorizes the competence-aware planning problem into two components. First, perception errors are learned in a model-free and location-agnostic setting via introspective perception prior to deployment in novel environments. Second, during actual deployments, the prediction of task-level failures is learned in a context-aware setting. Experiments in a simulation show that the proposed CPIP approach outperforms the frequentist baseline in multiple mobile robot tasks, and is further validated via real robot experiments in an environment with perceptually challenging obstacles and terrain.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Robots deployed in the real world over extended periods of time need to reason about unexpected failures, learn to predict them, and to proactively take actions to avoid future failures. Existing approaches for competence-aware planning are either model-based, requiring explicit enumeration of known failure modes, or purely statistical, using state- and location-specific failure statistics to infer competence. We instead propose a structured model-free approach to competence-aware planning by reasoning about plan execution failures due to errors in perception, without requiring a-priori enumeration of failure modes or requiring location-specific failure statistics. We introduce competence-aware path planning via introspective perception (CPIP), a Bayesian framework to iteratively learn and exploit task-level competence in novel deployment environments. CPIP factorizes the competence-aware planning problem into two components. First, perception errors are learned in a model-free and location-agnostic setting via introspective perception prior to deployment in novel environments. Second, during actual deployments, the prediction of task-level failures is learned in a context-aware setting. Experiments in a simulation show that the proposed CPIP approach outperforms the frequentist baseline in multiple mobile robot tasks, and is further validated via real robot experiments in an environment with perceptually challenging obstacles and terrain. |
Kyle Hollins Wray; Stefan J Witwicki; Shlomo Zilberstein Multiple Objective Explanation and Control Interface Design Miscellaneous 2021, (US Patent App. 16/727,038). @misc{SZ:BWWZpatent21h,
title = {Multiple Objective Explanation and Control Interface Design},
author = {Kyle Hollins Wray and Stefan J Witwicki and Shlomo Zilberstein},
url = {https://patents.google.com/patent/US20210200208A1/en},
year = {2021},
date = {2021-01-01},
publisher = {Google Patents},
abstract = {A vehicle traversing a vehicle transportation network may use a scenario-specific operational control evaluation model instance. A multi-objective policy for the model is received, wherein the policy includes at least a first objective, a second objective, and a priority of the first objective relative to the second objective. A representation of the policy (e.g., the first objective, the second objective, and the priority) is generated using a user interface. Based on feedback to the user interface, a change to the multi-objective policy for the scenario-specific operational control evaluation model is received. The change is to the first objective, the second objective, the priority, of some combination thereof. Then, for determining a vehicle control action for traversing the vehicle transportation network, an updated multi-objective policy for the scenario-specific operational control evaluation model is generated to include the change to the policy.},
note = {US Patent App. 16/727,038},
keywords = {},
pubstate = {published},
tppubtype = {misc}
}
A vehicle traversing a vehicle transportation network may use a scenario-specific operational control evaluation model instance. A multi-objective policy for the model is received, wherein the policy includes at least a first objective, a second objective, and a priority of the first objective relative to the second objective. A representation of the policy (e.g., the first objective, the second objective, and the priority) is generated using a user interface. Based on feedback to the user interface, a change to the multi-objective policy for the scenario-specific operational control evaluation model is received. The change is to the first objective, the second objective, the priority, of some combination thereof. Then, for determining a vehicle control action for traversing the vehicle transportation network, an updated multi-objective policy for the scenario-specific operational control evaluation model is generated to include the change to the policy. |
Kyle Hollins Wray; Stefan J Witwicki; Shlomo Zilberstein Shared Autonomous Vehicle Operational Management Miscellaneous 2021, (US Patent App. 16/955,531). @misc{SZ:WWZpatent21b,
title = {Shared Autonomous Vehicle Operational Management},
author = {Kyle Hollins Wray and Stefan J Witwicki and Shlomo Zilberstein},
url = {https://patents.google.com/patent/US20210078602A1/en},
year = {2021},
date = {2021-00-01},
publisher = {Google Patents},
abstract = {Traversing, by an autonomous vehicle, a vehicle transportation network, may include identifying a distinct vehicle operational scenario, wherein traversing the vehicle transportation network includes traversing a portion of the vehicle transportation network that includes the distinct vehicle operational scenario, communicating shared scenario-specific operational control management data associated with the distinct vehicle operational scenario with an external shared scenario-specific operational control management system, operating a scenario-specific operational control evaluation module instance including an instance of a scenario-specific operational control evaluation model of the distinct vehicle operational scenario, and wherein operating the scenario-specific operational control evaluation module instance includes identifying a policy for the scenario-specific operational control evaluation model, receiving a candidate vehicle control action from the policy for the scenario-specific operational control evaluation model, and traversing a portion of the vehicle transportation network based on the candidate vehicle control action.},
note = {US Patent App. 16/955,531},
keywords = {},
pubstate = {published},
tppubtype = {misc}
}
Traversing, by an autonomous vehicle, a vehicle transportation network, may include identifying a distinct vehicle operational scenario, wherein traversing the vehicle transportation network includes traversing a portion of the vehicle transportation network that includes the distinct vehicle operational scenario, communicating shared scenario-specific operational control management data associated with the distinct vehicle operational scenario with an external shared scenario-specific operational control management system, operating a scenario-specific operational control evaluation module instance including an instance of a scenario-specific operational control evaluation model of the distinct vehicle operational scenario, and wherein operating the scenario-specific operational control evaluation module instance includes identifying a policy for the scenario-specific operational control evaluation model, receiving a candidate vehicle control action from the policy for the scenario-specific operational control evaluation model, and traversing a portion of the vehicle transportation network based on the candidate vehicle control action. |
Kyle Hollins Wray; Stefan J Witwicki; Shlomo Zilberstein Centralized Shared Autonomous Vehicle Operational Management Miscellaneous 2021, (US Patent App. 16/955,531). @misc{SZ:WWZpatent21a,
title = {Centralized Shared Autonomous Vehicle Operational Management},
author = {Kyle Hollins Wray and Stefan J Witwicki and Shlomo Zilberstein},
url = {https://patents.google.com/patent/US20210009154A1/en},
year = {2021},
date = {2021-00-01},
publisher = {Google Patents},
abstract = {Centralized shared scenario-specific operational control management includes receiving, at a centralized shared scenario-specific operational control management device, shared scenario-specific operational control management input data, from an autonomous vehicle, validating the shared scenario-specific operational control management input data, identifying a current distinct vehicle operational scenario based on the shared scenario-specific operational control management input data, generating shared scenario-specific operational control management output data based on the current distinct vehicle operational scenario, and transmitting the shared scenario-specific operational control management output data.},
note = {US Patent App. 16/955,531},
keywords = {},
pubstate = {published},
tppubtype = {misc}
}
Centralized shared scenario-specific operational control management includes receiving, at a centralized shared scenario-specific operational control management device, shared scenario-specific operational control management input data, from an autonomous vehicle, validating the shared scenario-specific operational control management input data, identifying a current distinct vehicle operational scenario based on the shared scenario-specific operational control management input data, generating shared scenario-specific operational control management output data based on the current distinct vehicle operational scenario, and transmitting the shared scenario-specific operational control management output data. |
Kyle Hollins Wray; Stefan J Witwicki; Shlomo Zilberstein Autonomous Vehicle Operation with Explicit Occlusion Reasoning Miscellaneous 2021, (US Patent App. 16/753,601). @misc{SZ:BWWZpatent21k,
title = {Autonomous Vehicle Operation with Explicit Occlusion Reasoning},
author = {Kyle Hollins Wray and Stefan J Witwicki and Shlomo Zilberstein},
url = {https://patents.google.com/patent/US20210261123A1/en},
year = {2021},
date = {2021-00-01},
publisher = {Google Patents},
abstract = {Autonomous vehicle operation with explicit occlusion reasoning may include traversing, by a vehicle, a vehicle trans-network. Traversing the vehicle transportation network can include receiving, from a sensor of the vehicle, sensor data for a portion of a vehicle operational environment, determining, using the sensor data, a visibility grid comprising coordinates forming an unobserved region within a defined distance from the vehicle, computing a probability of a presence of an external object within the unobserved region by comparing the visibility grid to a map (eg, a high-definition map), and traversing a portion of the vehicle transportation network using the probability. An apparatus and a vehicle are also described.},
note = {US Patent App. 16/753,601},
keywords = {},
pubstate = {published},
tppubtype = {misc}
}
Autonomous vehicle operation with explicit occlusion reasoning may include traversing, by a vehicle, a vehicle trans-network. Traversing the vehicle transportation network can include receiving, from a sensor of the vehicle, sensor data for a portion of a vehicle operational environment, determining, using the sensor data, a visibility grid comprising coordinates forming an unobserved region within a defined distance from the vehicle, computing a probability of a presence of an external object within the unobserved region by comparing the visibility grid to a map (eg, a high-definition map), and traversing a portion of the vehicle transportation network using the probability. An apparatus and a vehicle are also described. |
Kyle Hollins Wray; Stefan J Witwicki; Shlomo Zilberstein Learning Safety and Human-Centered Constraints in Autonomous Vehicles Miscellaneous 2021, (US Patent App. 16/724,635). @misc{SZ:BWWZpatent21g,
title = {Learning Safety and Human-Centered Constraints in Autonomous Vehicles},
author = {Kyle Hollins Wray and Stefan J Witwicki and Shlomo Zilberstein},
url = {https://patents.google.com/patent/US20210132606A1/en},
year = {2021},
date = {2021-00-01},
publisher = {Google Patents},
abstract = {Traversing a vehicle transportation network includes operating a scenario-specific operational control evaluation module instance. The scenario-specific operational control evaluation module instance includes an instance of a scenario-specific operational control evaluation model of a distinct vehicle operational scenario. Operating the scenario-specific operational control evaluation module instance includes identifying a multi-objective policy for the scenario-specific operational control evaluation model. The multi-objective policy may include a relationship between at least two objectives. Traversing the vehicle transportation network includes receiving a candidate vehicle control action associated with each of the at least two objectives. Traversing the vehicle transportation network includes selecting a vehicle control action based on a buffer value. Traversing the vehicle transportation network includes performing the selected vehicle control action, determining a preference indicator for each objective, and updating the multi-objective policy.},
note = {US Patent App. 16/724,635},
keywords = {},
pubstate = {published},
tppubtype = {misc}
}
Traversing a vehicle transportation network includes operating a scenario-specific operational control evaluation module instance. The scenario-specific operational control evaluation module instance includes an instance of a scenario-specific operational control evaluation model of a distinct vehicle operational scenario. Operating the scenario-specific operational control evaluation module instance includes identifying a multi-objective policy for the scenario-specific operational control evaluation model. The multi-objective policy may include a relationship between at least two objectives. Traversing the vehicle transportation network includes receiving a candidate vehicle control action associated with each of the at least two objectives. Traversing the vehicle transportation network includes selecting a vehicle control action based on a buffer value. Traversing the vehicle transportation network includes performing the selected vehicle control action, determining a preference indicator for each objective, and updating the multi-objective policy. |
Kyle Hollins Wray; Stefan J Witwicki; Shlomo Zilberstein Objective-Based Reasoning in Autonomous Vehicle Decision-Making Miscellaneous 2021, (US Patent App. 16/695,613). @misc{SZ:BWWZpatent21e,
title = {Objective-Based Reasoning in Autonomous Vehicle Decision-Making},
author = {Kyle Hollins Wray and Stefan J Witwicki and Shlomo Zilberstein},
url = {https://patents.google.com/patent/US20210157314A1/en},
year = {2021},
date = {2021-00-01},
publisher = {Google Patents},
abstract = {Traversing a vehicle transportation network includes operating a scenario-specific operational control evaluation module instance. The scenario-specific operational control evaluation module instance includes an instance of a scenario-specific operational control evaluation model of a distinct vehicle operational scenario. Operating the scenario-specific operational control evaluation module instance includes identifying a multi-objective policy for the scenario-specific operational control evaluation model. The multi-objective policy may include a relationship between at least two objectives. Traversing the vehicle transportation network includes receiving a candidate vehicle control action associated with each of the at least two objectives. Traversing the vehicle transportation network includes selecting a vehicle control action based on a buffer value. Traversing the vehicle transportation network includes traversing a portion of the vehicle transportation network in accordance with the selected vehicle control action.},
note = {US Patent App. 16/695,613},
keywords = {},
pubstate = {published},
tppubtype = {misc}
}
Traversing a vehicle transportation network includes operating a scenario-specific operational control evaluation module instance. The scenario-specific operational control evaluation module instance includes an instance of a scenario-specific operational control evaluation model of a distinct vehicle operational scenario. Operating the scenario-specific operational control evaluation module instance includes identifying a multi-objective policy for the scenario-specific operational control evaluation model. The multi-objective policy may include a relationship between at least two objectives. Traversing the vehicle transportation network includes receiving a candidate vehicle control action associated with each of the at least two objectives. Traversing the vehicle transportation network includes selecting a vehicle control action based on a buffer value. Traversing the vehicle transportation network includes traversing a portion of the vehicle transportation network in accordance with the selected vehicle control action. |
2020
|
Kyle Hollins Wray; Stefan J Witwicki; Shlomo Zilberstein; Liam Pedersen Autonomous Vehicle Operational Management Control Miscellaneous 2020, (US Patent 10,654,476). @misc{SZ:WWZCpatent20e,
title = {Autonomous Vehicle Operational Management Control},
author = {Kyle Hollins Wray and Stefan J Witwicki and Shlomo Zilberstein and Liam Pedersen},
url = {https://patents.google.com/patent/US10654476B2/en},
year = {2020},
date = {2020-05-01},
publisher = {Google Patents},
abstract = {Autonomous vehicle operational management may include traversing, by an autonomous vehicle, a vehicle transportation network. Traversing the vehicle transportation network may include receiving, from a sensor of the autonomous vehicle, sensor information corresponding to an external object within a defined distance of the autonomous vehicle, identifying a distinct vehicle operational scenario in response to receiving the sensor information, instantiating a scenario-specific operational control evaluation module instance, wherein the scenario-specific operational control evaluation module instance is an instance of a scenario-specific operational control evaluation module modeling the distinct vehicle operational scenario, receiving a candidate vehicle control action from the scenario-specific operational control evaluation module instance, and traversing a portion of the vehicle transportation network based on the candidate vehicle control action.},
note = {US Patent 10,654,476},
keywords = {},
pubstate = {published},
tppubtype = {misc}
}
Autonomous vehicle operational management may include traversing, by an autonomous vehicle, a vehicle transportation network. Traversing the vehicle transportation network may include receiving, from a sensor of the autonomous vehicle, sensor information corresponding to an external object within a defined distance of the autonomous vehicle, identifying a distinct vehicle operational scenario in response to receiving the sensor information, instantiating a scenario-specific operational control evaluation module instance, wherein the scenario-specific operational control evaluation module instance is an instance of a scenario-specific operational control evaluation module modeling the distinct vehicle operational scenario, receiving a candidate vehicle control action from the scenario-specific operational control evaluation module instance, and traversing a portion of the vehicle transportation network based on the candidate vehicle control action. |
Kyle Hollins Wray; Stefan J Witwicki; Shlomo Zilberstein; Liam Pedersen Autonomous Vehicle Operational Management Including Operating A Partially Observable Markov Decision Process Model Instance Miscellaneous 2020, (US Patent App. 16/473,148). @misc{SZ:WWZPpatent20b,
title = {Autonomous Vehicle Operational Management Including Operating A Partially Observable Markov Decision Process Model Instance},
author = {Kyle Hollins Wray and Stefan J Witwicki and Shlomo Zilberstein and Liam Pedersen},
url = {https://patents.google.com/patent/US20200097003A1/en},
year = {2020},
date = {2020-03-26},
publisher = {Google Patents},
abstract = {Autonomous vehicle operational management may include traversing, by an autonomous vehicle, a vehicle transportation network. Traversing the vehicle transportation network may include operating a scenario-specific operational control evaluation module instance, wherein the scenario-specific operational control evaluation module instance is an instance of a scenario-specific operational control evaluation module, wherein the scenario-specific operational control evaluation module implements a partially observable Markov decision process. Traversing the vehicle transportation network may include receiving a candidate vehicle control action from the scenario-specific operational control evaluation module instance, and traversing a portion of the vehicle transportation network based on the candidate vehicle control action.},
note = {US Patent App. 16/473,148},
keywords = {},
pubstate = {published},
tppubtype = {misc}
}
Autonomous vehicle operational management may include traversing, by an autonomous vehicle, a vehicle transportation network. Traversing the vehicle transportation network may include operating a scenario-specific operational control evaluation module instance, wherein the scenario-specific operational control evaluation module instance is an instance of a scenario-specific operational control evaluation module, wherein the scenario-specific operational control evaluation module implements a partially observable Markov decision process. Traversing the vehicle transportation network may include receiving a candidate vehicle control action from the scenario-specific operational control evaluation module instance, and traversing a portion of the vehicle transportation network based on the candidate vehicle control action. |
Kyle Hollins Wray; Stefan J Witwicki; Shlomo Zilberstein; Liam Pedersen Autonomous Vehicle Operational Management Blocking Monitoring Miscellaneous 2020, (US Patent App. 16/473,037). @misc{SZ:WWZPpatent20c,
title = {Autonomous Vehicle Operational Management Blocking Monitoring},
author = {Kyle Hollins Wray and Stefan J Witwicki and Shlomo Zilberstein and Liam Pedersen},
url = {https://patents.google.com/patent/US20200098269A1/en},
year = {2020},
date = {2020-03-26},
publisher = {Google Patents},
abstract = {Autonomous vehicle operational management including blocking monitoring may include traversing, by an autonomous vehicle, a vehicle transportation network. Traversing the vehicle transportation network may include operating a blocking monitor instance, which may include identifying operational environment information including information corresponding to a first external object within a defined distance of the autonomous vehicle, determining a first area of the vehicle transportation network based on a current geospatial location of the autonomous vehicle in the vehicle transportation network and an identified route for the autonomous vehicle, and determining a probability of availability for the first area based on the operational environment information. Traversing the vehicle transportation network may include traversing a portion of the vehicle transportation network based on the probability of availability.},
note = {US Patent App. 16/473,037},
keywords = {},
pubstate = {published},
tppubtype = {misc}
}
Autonomous vehicle operational management including blocking monitoring may include traversing, by an autonomous vehicle, a vehicle transportation network. Traversing the vehicle transportation network may include operating a blocking monitor instance, which may include identifying operational environment information including information corresponding to a first external object within a defined distance of the autonomous vehicle, determining a first area of the vehicle transportation network based on a current geospatial location of the autonomous vehicle in the vehicle transportation network and an identified route for the autonomous vehicle, and determining a probability of availability for the first area based on the operational environment information. Traversing the vehicle transportation network may include traversing a portion of the vehicle transportation network based on the probability of availability. |
Kyle Hollins Wray; Stefan J Witwicki; Shlomo Zilberstein; Melissa Cefkin Orientation-Adjust Actions for Autonomous Vehicle Operational Management Miscellaneous 2020, (US Patent App. 16/023,710). @misc{SZ:WWZCpatent20a,
title = {Orientation-Adjust Actions for Autonomous Vehicle Operational Management},
author = {Kyle Hollins Wray and Stefan J Witwicki and Shlomo Zilberstein and Melissa Cefkin},
url = {http://www.freepatentsonline.com/y2020/0005645.html},
year = {2020},
date = {2020-01-02},
publisher = {Google Patents},
abstract = {Traversing, by an autonomous vehicle, a vehicle transportation network, may include identifying a policy for a scenario-specific operational control evaluation model of a distinct vehicle operational scenario, receiving a candidate vehicle control action from the policy, wherein, in response to a determination that an uncertainty value for the distinct vehicle operational scenario exceeds a defined uncertainty threshold, the candidate vehicle control action is an orientation-adjust vehicle control action, and traversing a portion of the vehicle transportation network in accordance with the candidate vehicle control action, wherein the portion of the vehicle transportation network includes the distinct vehicle operational scenario.},
note = {US Patent App. 16/023,710},
keywords = {},
pubstate = {published},
tppubtype = {misc}
}
Traversing, by an autonomous vehicle, a vehicle transportation network, may include identifying a policy for a scenario-specific operational control evaluation model of a distinct vehicle operational scenario, receiving a candidate vehicle control action from the policy, wherein, in response to a determination that an uncertainty value for the distinct vehicle operational scenario exceeds a defined uncertainty threshold, the candidate vehicle control action is an orientation-adjust vehicle control action, and traversing a portion of the vehicle transportation network in accordance with the candidate vehicle control action, wherein the portion of the vehicle transportation network includes the distinct vehicle operational scenario. |
Sandhya Saisubramanian; Ece Kamar; Shlomo Zilberstein A Multi-Objective Approach to Mitigate Negative Side Effects Conference Proceedings of the 29th International Joint Conference on Artificial Intelligence (IJCAI), 2020, (Distinguished Paper Award). @conference{SZ:SKZijcai20,
title = {A Multi-Objective Approach to Mitigate Negative Side Effects},
author = {Sandhya Saisubramanian and Ece Kamar and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/SKZijcai20.pdf},
year = {2020},
date = {2020-01-01},
booktitle = {Proceedings of the 29th International Joint Conference on Artificial Intelligence (IJCAI)},
abstract = {Agents operating in unstructured environments of- ten create negative side effects (NSE) that may not be easy to identify at design time. We examine how various forms of human feedback or autonomous exploration can be used to learn a penalty function associated with NSE during system deployment. We formulate the problem of mitigating the impact of NSE as a multi-objective Markov deci- sion process with lexicographic reward preferences and slack. The slack denotes the maximum deviation from an optimal policy with respect to the agent’s primary objective allowed in order to mitigate NSE as a secondary objective. Empirical evaluation of our approach shows that the proposed framework can successfully mitigate NSE and that different feedback mechanisms introduce different biases, which influence the identification of NSE.},
note = {Distinguished Paper Award},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Agents operating in unstructured environments of- ten create negative side effects (NSE) that may not be easy to identify at design time. We examine how various forms of human feedback or autonomous exploration can be used to learn a penalty function associated with NSE during system deployment. We formulate the problem of mitigating the impact of NSE as a multi-objective Markov deci- sion process with lexicographic reward preferences and slack. The slack denotes the maximum deviation from an optimal policy with respect to the agent’s primary objective allowed in order to mitigate NSE as a secondary objective. Empirical evaluation of our approach shows that the proposed framework can successfully mitigate NSE and that different feedback mechanisms introduce different biases, which influence the identification of NSE. |
Sandhya Saisubramanian; Sainyam Galhotra; Shlomo Zilberstein Balancing the Tradeoff Between Clustering Value and Interpretability Conference Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (AIES), New York, NY, 2020. @conference{SZ:SGZaies20,
title = {Balancing the Tradeoff Between Clustering Value and Interpretability},
author = {Sandhya Saisubramanian and Sainyam Galhotra and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/SGZaies20.pdf},
doi = {10.1145/3375627.3375843},
year = {2020},
date = {2020-01-01},
booktitle = {Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (AIES)},
pages = {351--357},
address = {New York, NY},
abstract = {Graph clustering groups entities -- the vertices of a graph -- based on their similarity, typically using a complex distance function over a large number of features. Successful integration of clustering approaches in automated decision-support systems hinges on the interpretability of the resulting clusters. This paper addresses the problem of generating interpretable clusters, given features of interest that signify interpretability to an end-user, by optimizing interpretability in addition to common clustering objectives. We propose a β-interpretable clustering algorithm that ensures that at least β fraction of nodes in each cluster share the same feature value. The tunable parameter β is user-specified. We also present a more efficient algorithm for scenarios with β = 1 and analyze the theoretical guarantees of the two algorithms. Finally, we empirically demonstrate the benefits of our approaches in generating interpretable clusters using four real-world datasets. The interpretability of the clusters is complemented by generating simple explanations denoting the feature values of the nodes in the clusters, using frequent pattern mining.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Graph clustering groups entities -- the vertices of a graph -- based on their similarity, typically using a complex distance function over a large number of features. Successful integration of clustering approaches in automated decision-support systems hinges on the interpretability of the resulting clusters. This paper addresses the problem of generating interpretable clusters, given features of interest that signify interpretability to an end-user, by optimizing interpretability in addition to common clustering objectives. We propose a β-interpretable clustering algorithm that ensures that at least β fraction of nodes in each cluster share the same feature value. The tunable parameter β is user-specified. We also present a more efficient algorithm for scenarios with β = 1 and analyze the theoretical guarantees of the two algorithms. Finally, we empirically demonstrate the benefits of our approaches in generating interpretable clusters using four real-world datasets. The interpretability of the clusters is complemented by generating simple explanations denoting the feature values of the nodes in the clusters, using frequent pattern mining. |
Beverly Woolf; Aritra Ghosh; Andrew Lan; Shlomo Zilberstein; Tom Juravich; Andrew Cohen; Olivia Geho AI-Enabled Training in Manufacturing Workforce Development Conference AAAI Spring Symposium on Artificial Intelligence in Manufacturing, 2020. @conference{SZ:WGLZJCGspring20,
title = {AI-Enabled Training in Manufacturing Workforce Development},
author = {Beverly Woolf and Aritra Ghosh and Andrew Lan and Shlomo Zilberstein and Tom Juravich and Andrew Cohen and Olivia Geho},
year = {2020},
date = {2020-01-01},
booktitle = {AAAI Spring Symposium on Artificial Intelligence in Manufacturing},
abstract = {A highly productive workforce can evolve with the integration of digital devices, such as computer interfaces to operating machines, interconnected smart devices, and robots, in the workplace. However, this potential cannot be realized with the current state-of-the-art systems used to train workers. This problem is acute in manufacturing, where huge skills gaps are evident; most workers lack the necessary skills to operate or collaborate with autonomous systems. We propose to address this problem by using intelligent tutoring systems and worker data analysis. The worker data includes: i) fine-grained on-job performance data, ii) career path data containing the entire career paths of workers, and iii) job posting data over a long period of time indicating the required skills for each job. We will collect and analyze worker data and use it to drive new methods for training and reskilling workers. We detail ideas and tools to be developed by research in intelligent tutoring systems, data science, manufacturing, sociology, labor analysis, education, psychology, and economics. We also describe a convergent approach to developing effective, fair, and scalable software solutions and dynamic intelligent training. address = Stanford, California},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
A highly productive workforce can evolve with the integration of digital devices, such as computer interfaces to operating machines, interconnected smart devices, and robots, in the workplace. However, this potential cannot be realized with the current state-of-the-art systems used to train workers. This problem is acute in manufacturing, where huge skills gaps are evident; most workers lack the necessary skills to operate or collaborate with autonomous systems. We propose to address this problem by using intelligent tutoring systems and worker data analysis. The worker data includes: i) fine-grained on-job performance data, ii) career path data containing the entire career paths of workers, and iii) job posting data over a long period of time indicating the required skills for each job. We will collect and analyze worker data and use it to drive new methods for training and reskilling workers. We detail ideas and tools to be developed by research in intelligent tutoring systems, data science, manufacturing, sociology, labor analysis, education, psychology, and economics. We also describe a convergent approach to developing effective, fair, and scalable software solutions and dynamic intelligent training. address = Stanford, California |
Connor Basich; Justin Svegliato; Kyle Hollins Wray; Stefan J Witwicki; Joydeep Biswas; Shlomo Zilberstein Learning to Optimize Autonomy in Competence-Aware Systems Conference Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), Auckland, New Zealand, 2020. @conference{SZ:BSWWBZaamas20,
title = {Learning to Optimize Autonomy in Competence-Aware Systems},
author = {Connor Basich and Justin Svegliato and Kyle Hollins Wray and Stefan J Witwicki and Joydeep Biswas and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/BSWWBZaamas20.pdf},
year = {2020},
date = {2020-01-01},
booktitle = {Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS)},
address = {Auckland, New Zealand},
abstract = {Interest in semi-autonomous systems (SAS) is growing rapidly as a paradigm to deploy autonomous systems in domains that require occasional reliance on humans. This paradigm allows service robots or autonomous vehicles to operate at varying levels of autonomy and offer safety in situations that require human judgment. We propose an introspective model of autonomy that is learned and updated online through experience and dictates the extent to which the agent can act autonomously in any given situation. We define a competence-aware system (CAS) that explicitly models its own proficiency at different levels of autonomy and the available human feedback. A CAS learns to adjust its level of autonomy based on experience to maximize overall efficiency, factoring in the cost of human assistance. We analyze the convergence properties of CAS and provide experimental results for robot delivery and autonomous driving domains that demonstrate the benefits of the approach.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Interest in semi-autonomous systems (SAS) is growing rapidly as a paradigm to deploy autonomous systems in domains that require occasional reliance on humans. This paradigm allows service robots or autonomous vehicles to operate at varying levels of autonomy and offer safety in situations that require human judgment. We propose an introspective model of autonomy that is learned and updated online through experience and dictates the extent to which the agent can act autonomously in any given situation. We define a competence-aware system (CAS) that explicitly models its own proficiency at different levels of autonomy and the available human feedback. A CAS learns to adjust its level of autonomy based on experience to maximize overall efficiency, factoring in the cost of human assistance. We analyze the convergence properties of CAS and provide experimental results for robot delivery and autonomous driving domains that demonstrate the benefits of the approach. |
Shuwa Miura; Shlomo Zilberstein Maximizing Plan Legibility in Stochastic Environments (Extended Abstract) Conference Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), Auckland, New Zealand, 2020. @conference{SZ:MZaamas20,
title = {Maximizing Plan Legibility in Stochastic Environments (Extended Abstract)},
author = {Shuwa Miura and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/MZaamas20.pdf},
year = {2020},
date = {2020-01-01},
booktitle = {Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS)},
address = {Auckland, New Zealand},
abstract = {Legible behavior allows an observing agent to infer the intention of an observed agent. Producing legible behavior is crucial for successful multi-agent interaction in many domains. We introduce techniques for legible planning in stochastic environments. Maximizing legibility, however, presents a complex trade-off between maximizing the underlying rewards. Hence, we propose a method to balance the trade-off. In our experiments, we demonstrate that maximizing legibility results in unambiguous behaviors.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Legible behavior allows an observing agent to infer the intention of an observed agent. Producing legible behavior is crucial for successful multi-agent interaction in many domains. We introduce techniques for legible planning in stochastic environments. Maximizing legibility, however, presents a complex trade-off between maximizing the underlying rewards. Hence, we propose a method to balance the trade-off. In our experiments, we demonstrate that maximizing legibility results in unambiguous behaviors. |
Sandhya Saisubramanian; Ece Kamar; Shlomo Zilberstein Mitigating the Negative Side Effects of Reasoning with Imperfect Models: A Multi-Objective Approach (Extended Abstract) Conference Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), Auckland, New Zealand, 2020. @conference{SZ:SKZaamas20,
title = {Mitigating the Negative Side Effects of Reasoning with Imperfect Models: A Multi-Objective Approach (Extended Abstract)},
author = {Sandhya Saisubramanian and Ece Kamar and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/SKZaamas20.pdf},
year = {2020},
date = {2020-01-01},
booktitle = {Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS)},
address = {Auckland, New Zealand},
abstract = {Agents often operate using imperfect models of the environment that ignore certain aspects of the real world. Reasoning with such models may lead to negative side effects (NSE) when satisfying the primary objective of the available model, which are inherently difficult to identify at design time. We examine how various forms of feedback can be used to learn a penalty function associated with NSE during execution. We formulate the problem of mitigating the impact of NSE as a multi-objective Markov decision process with lexicographic reward preferences and slack. Empirical evaluation of our approach on three domains shows that the proposed framework can successfully mitigate NSE.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Agents often operate using imperfect models of the environment that ignore certain aspects of the real world. Reasoning with such models may lead to negative side effects (NSE) when satisfying the primary objective of the available model, which are inherently difficult to identify at design time. We examine how various forms of feedback can be used to learn a penalty function associated with NSE during execution. We formulate the problem of mitigating the impact of NSE as a multi-objective Markov decision process with lexicographic reward preferences and slack. Empirical evaluation of our approach on three domains shows that the proposed framework can successfully mitigate NSE. |
Justin Svegliato; Prakhar Sharma; Shlomo Zilberstein A Model-Free Approach to Meta-Level Control of Anytime Algorithms Conference Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 2020. @conference{SZ:SSZicra20,
title = {A Model-Free Approach to Meta-Level Control of Anytime Algorithms},
author = {Justin Svegliato and Prakhar Sharma and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/SSZicra20.pdf},
year = {2020},
date = {2020-01-01},
booktitle = {Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)},
address = {Paris, France},
abstract = {Anytime algorithms offer a trade-off between solution quality and computation time that has proven to be useful in autonomous systems for a wide range of real-time planning problems. In order to optimize this trade-off, an autonomous system has to solve a challenging meta-level control problem: it must decide when to interrupt the anytime algorithm and act on the current solution. Prevailing meta-level control techniques, however, make a number of unrealistic assumptions that reduce their effectiveness and usefulness in the real world. Eliminating these assumptions, we first introduce a model-free approach to meta-level control based on reinforcement learning and prove its optimality. We then offer a general meta-level control technique that can use different reinforcement learning methods. Finally, we show that our approach is effective across several common benchmark domains and a mobile robot domain.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Anytime algorithms offer a trade-off between solution quality and computation time that has proven to be useful in autonomous systems for a wide range of real-time planning problems. In order to optimize this trade-off, an autonomous system has to solve a challenging meta-level control problem: it must decide when to interrupt the anytime algorithm and act on the current solution. Prevailing meta-level control techniques, however, make a number of unrealistic assumptions that reduce their effectiveness and usefulness in the real world. Eliminating these assumptions, we first introduce a model-free approach to meta-level control based on reinforcement learning and prove its optimality. We then offer a general meta-level control technique that can use different reinforcement learning methods. Finally, we show that our approach is effective across several common benchmark domains and a mobile robot domain. |
Christabel Wayllace; Sarah Keren; Avigdor Gal; Erez Karpas; William Yeoh; Shlomo Zilberstein Accounting for Observer's Partial Observability in Stochastic Goal Recognition Design Conference Proceedings of the 24th European Conference on Artificial Intelligence (ECAI), 2020. @conference{SZ:WKGKYZecai20,
title = {Accounting for Observer's Partial Observability in Stochastic Goal Recognition Design},
author = {Christabel Wayllace and Sarah Keren and Avigdor Gal and Erez Karpas and William Yeoh and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/WKGKYZecai20.pdf},
year = {2020},
date = {2020-01-01},
booktitle = {Proceedings of the 24th European Conference on Artificial Intelligence (ECAI)},
abstract = {Motivated by security applications, where agent intentions are unknown, actions may have stochastic outcomes, and an ob- server may have an obfuscated view due to low sensor resolution, we introduce partially-observable states and unobservable actions into a stochastic goal recognition design framework. The proposed model is accompanied by a method for calculating the expected maximal number of steps before the goal of an agent is revealed and a new sensor refinement modification that can be applied to enhance goal recognition. A preliminary empirical evaluation on a range of bench- mark applications shows the effectiveness of our approach.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Motivated by security applications, where agent intentions are unknown, actions may have stochastic outcomes, and an ob- server may have an obfuscated view due to low sensor resolution, we introduce partially-observable states and unobservable actions into a stochastic goal recognition design framework. The proposed model is accompanied by a method for calculating the expected maximal number of steps before the goal of an agent is revealed and a new sensor refinement modification that can be applied to enhance goal recognition. A preliminary empirical evaluation on a range of bench- mark applications shows the effectiveness of our approach. |
Henry Renski; Laurel Smith-Doerr; Tiamba Wilkerson; Shannon C Roberts; Shlomo Zilberstein; Enobong H Branch Racial Equity and the Future of Work Journal Article In: Technology| Architecture+ Design, vol. 4, no. 1, pp. 17–22, 2020. @article{SZ:RSWRZBtad20,
title = {Racial Equity and the Future of Work},
author = {Henry Renski and Laurel Smith-Doerr and Tiamba Wilkerson and Shannon C Roberts and Shlomo Zilberstein and Enobong H Branch},
doi = {10.1080/24751448.2020.1705711},
year = {2020},
date = {2020-01-01},
journal = {Technology| Architecture+ Design},
volume = {4},
number = {1},
pages = {17--22},
publisher = {Taylor & Francis doi = 10.1080/24751448.2020.1705711},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
|
Connor Basich; Justin Svegliato; Kyle Hollins Wray; Stefan J Witwicki; Joydeep Biswas; Shlomo Zilberstein Learning to Optimize Autonomy in Competence-Aware Systems Journal Article In: CoRR, vol. abs/2003.07745, 2020. @article{SZ:BSWWBZarXiv20a,
title = {Learning to Optimize Autonomy in Competence-Aware Systems},
author = {Connor Basich and Justin Svegliato and Kyle Hollins Wray and Stefan J Witwicki and Joydeep Biswas and Shlomo Zilberstein},
url = {https://arxiv.org/abs/2003.07745},
year = {2020},
date = {2020-01-01},
journal = {CoRR},
volume = {abs/2003.07745},
abstract = {Interest in semi-autonomous systems (SAS) is growing rapidly as a paradigm to deploy autonomous systems in domains that require occasional reliance on humans. This paradigm allows service robots or autonomous vehicles to operate at varying levels of autonomy and offer safety in situations that require human judgment. We propose an introspective model of autonomy that is learned and updated online through experience and dictates the extent to which the agent can act autonomously in any given situation. We define a competence-aware system (CAS) that explicitly models its own proficiency at different levels of autonomy and the available human feedback. A CAS learns to adjust its level of autonomy based on experience to maximize overall efficiency, factoring in the cost of human assistance. We analyze the convergence properties of CAS and provide experimental results for robot delivery and autonomous driving domains that demonstrate the benefits of the approach.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Interest in semi-autonomous systems (SAS) is growing rapidly as a paradigm to deploy autonomous systems in domains that require occasional reliance on humans. This paradigm allows service robots or autonomous vehicles to operate at varying levels of autonomy and offer safety in situations that require human judgment. We propose an introspective model of autonomy that is learned and updated online through experience and dictates the extent to which the agent can act autonomously in any given situation. We define a competence-aware system (CAS) that explicitly models its own proficiency at different levels of autonomy and the available human feedback. A CAS learns to adjust its level of autonomy based on experience to maximize overall efficiency, factoring in the cost of human assistance. We analyze the convergence properties of CAS and provide experimental results for robot delivery and autonomous driving domains that demonstrate the benefits of the approach. |
Shane Parr; Ishan Khatri; Justin Svegliato; Shlomo Zilberstein Agent-Aware State Estimation: Effective Traffic Light Classification for Autonomous Vehicles Conference ICRA 2020 Workshop on Sensing, Estimating and Understanding the Dynamic World, 2020. @conference{SZ:PKSZicra20ws,
title = {Agent-Aware State Estimation: Effective Traffic Light Classification for Autonomous Vehicles},
author = {Shane Parr and Ishan Khatri and Justin Svegliato and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/PKSZicra20ws.pdf},
year = {2020},
date = {2020-01-01},
booktitle = {ICRA 2020 Workshop on Sensing, Estimating and Understanding the Dynamic World},
abstract = {Autonomous systems often operate in environments where the behavior of all agents is mostly governed by the perception of a specific feature of the environment. When an autonomous system cannot recover this feature, there can be disastrous consequences. We introduce a novel framework for agent-aware state estimation that exploits the dependency of all agents' behavior on a feature to better indirectly observe the feature. To allow for fast and accurate inference, we provide a mapping of our framework to a dynamic Bayesian network and show that speed of inference scales favorably with the number of agents in the environment. We then apply our approach to traffic light classification, focusing on instances where direct vision of the light may be obstructed by glare, heavy rain, vehicles, or other environmental factors. Finally, we show that agent-aware state estimation outperforms prevailing methods that only use direct image data of the traffic light on a real-world autonomous vehicle data set of challenging scenarios.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Autonomous systems often operate in environments where the behavior of all agents is mostly governed by the perception of a specific feature of the environment. When an autonomous system cannot recover this feature, there can be disastrous consequences. We introduce a novel framework for agent-aware state estimation that exploits the dependency of all agents' behavior on a feature to better indirectly observe the feature. To allow for fast and accurate inference, we provide a mapping of our framework to a dynamic Bayesian network and show that speed of inference scales favorably with the number of agents in the environment. We then apply our approach to traffic light classification, focusing on instances where direct vision of the light may be obstructed by glare, heavy rain, vehicles, or other environmental factors. Finally, we show that agent-aware state estimation outperforms prevailing methods that only use direct image data of the traffic light on a real-world autonomous vehicle data set of challenging scenarios. |
Richard G Freedman; Steven J Levine; Brian C Williams; Shlomo Zilberstein Helpfulness as a Key Metric of Human-Robot Collaboration Conference AAAI Fall Symposium on Artificial Intelligence and Human-Robot Interaction (AI-HRI), 2020. @conference{FLWZfall20,
title = {Helpfulness as a Key Metric of Human-Robot Collaboration},
author = {Richard G Freedman and Steven J Levine and Brian C Williams and Shlomo Zilberstein},
url = {https://arxiv.org/pdf/2010.04914.pdf},
year = {2020},
date = {2020-01-01},
booktitle = {AAAI Fall Symposium on Artificial Intelligence and Human-Robot Interaction (AI-HRI)},
abstract = {As robotic teammates become more common in society, people will assess the robots' roles in their interactions along many dimensions. One such dimension is effectiveness: people will ask whether their robotic partners are trustworthy and effective collaborators. This begs a crucial question: how can we quantitatively measure the helpfulness of a robotic partner for a given task at hand? This paper seeks to answer this question with regards to the interactive robot's decision making. We describe a clear, concise, and task-oriented metric applicable to many different planning and execution paradigms. The proposed helpfulness metric is fundamental to assessing the benefit that a partner has on a team for a given task. In this paper, we define helpfulness, illustrate it on concrete examples from a variety of domains, discuss its properties and ramifications for planning interactions with humans, and present preliminary results.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
As robotic teammates become more common in society, people will assess the robots' roles in their interactions along many dimensions. One such dimension is effectiveness: people will ask whether their robotic partners are trustworthy and effective collaborators. This begs a crucial question: how can we quantitatively measure the helpfulness of a robotic partner for a given task at hand? This paper seeks to answer this question with regards to the interactive robot's decision making. We describe a clear, concise, and task-oriented metric applicable to many different planning and execution paradigms. The proposed helpfulness metric is fundamental to assessing the benefit that a partner has on a team for a given task. In this paper, we define helpfulness, illustrate it on concrete examples from a variety of domains, discuss its properties and ramifications for planning interactions with humans, and present preliminary results. |
Feng Wu; Shlomo Zilberstein; Nicholas R Jennings Multi-Agent Planning with High-Level Human Guidance Conference Proceedings of Principles and Practice of Multi-Agent Systems (PRIMA), 2020. @conference{SZ:WZJprima20,
title = {Multi-Agent Planning with High-Level Human Guidance},
author = {Feng Wu and Shlomo Zilberstein and Nicholas R Jennings},
url = {http://rbr.cs.umass.edu/shlomo/papers/WZJprima20.pdf},
year = {2020},
date = {2020-01-01},
booktitle = {Proceedings of Principles and Practice of Multi-Agent Systems (PRIMA)},
abstract = {Planning and coordination of multiple agents in the presence of uncertainty and noisy sensors is extremely hard. A human operator who observes a multi-agent team can provide valuable guidance to the team based on her superior ability to interpret observations and assess the overall situation. We propose an extension of decentralized POMDPs that allows such human guidance to be factored into the planning and execution processes. Human guidance in our framework consists of intuitive high-level commands that the agents must translate into a suitable joint plan that is sensitive to what they know from local observations. The result is a framework that allows multi-agent systems to benefit from the complex strategic thinking of a human supervising them. We evaluate this approach on several common benchmark problems and show that it can lead to dramatic improvement in performance.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Planning and coordination of multiple agents in the presence of uncertainty and noisy sensors is extremely hard. A human operator who observes a multi-agent team can provide valuable guidance to the team based on her superior ability to interpret observations and assess the overall situation. We propose an extension of decentralized POMDPs that allows such human guidance to be factored into the planning and execution processes. Human guidance in our framework consists of intuitive high-level commands that the agents must translate into a suitable joint plan that is sensitive to what they know from local observations. The result is a framework that allows multi-agent systems to benefit from the complex strategic thinking of a human supervising them. We evaluate this approach on several common benchmark problems and show that it can lead to dramatic improvement in performance. |
Justin Svegliato; Stefan J Witwicki; Kyle Hollins Wray; Shlomo Zilberstein Introspective Autonomous Vehicle Operational Management Miscellaneous 2020, (US Patent 10,649,453). @misc{SZ:SWWZpatent20d,
title = {Introspective Autonomous Vehicle Operational Management},
author = {Justin Svegliato and Stefan J Witwicki and Kyle Hollins Wray and Shlomo Zilberstein},
url = {https://patents.google.com/patent/US10649453B1/en},
year = {2020},
date = {2020-00-01},
publisher = {Google Patents},
abstract = {Introspective autonomous vehicle operational management includes operating an introspective autonomous vehicle operational management controller including a policy for a model of an introspective autonomous vehicle operational management domain. Operating the controller includes, in response to a determination that a current belief state of the policy indicates an exceptional condition, identifying an exception handler for controlling the autonomous vehicle. Operating the controller includes, in response to a determination that the current belief state indicates an unexceptional condition, identifying a primary handler as the active handler. Operating the controller includes controlling the autonomous vehicle to traverse a current portion of the vehicle transportation network in accordance with the active handler, receiving an indicator output by the active handler, generating an updated belief state based on the indicator, and controlling the autonomous vehicle to traverse a subsequent portion of the vehicle transportation network based on the updated belief state.},
note = {US Patent 10,649,453},
keywords = {},
pubstate = {published},
tppubtype = {misc}
}
Introspective autonomous vehicle operational management includes operating an introspective autonomous vehicle operational management controller including a policy for a model of an introspective autonomous vehicle operational management domain. Operating the controller includes, in response to a determination that a current belief state of the policy indicates an exceptional condition, identifying an exception handler for controlling the autonomous vehicle. Operating the controller includes, in response to a determination that the current belief state indicates an unexceptional condition, identifying a primary handler as the active handler. Operating the controller includes controlling the autonomous vehicle to traverse a current portion of the vehicle transportation network in accordance with the active handler, receiving an indicator output by the active handler, generating an updated belief state based on the indicator, and controlling the autonomous vehicle to traverse a subsequent portion of the vehicle transportation network based on the updated belief state. |
2019
|
Sandhya Saisubramanian; Shlomo Zilberstein Minimizing the Negative Side Effects of Planning with Reduced Models Conference AAAI Workshop on Artificial Intelligence Safety, Honolulu, Hawaii, 2019. @conference{SZ:SZaaai19ws,
title = {Minimizing the Negative Side Effects of Planning with Reduced Models},
author = {Sandhya Saisubramanian and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/SZaaai19ws.pdf},
year = {2019},
date = {2019-01-01},
booktitle = {AAAI Workshop on Artificial Intelligence Safety},
address = {Honolulu, Hawaii},
abstract = {Reduced models of large Markov decision processes accelerate planning by considering a subset of outcomes for each state-action pair. This reduction in reachable states leads to replanning when the agent encounters states without a pre-computed action during plan execution. However, not all states are suitable for replanning. In the worst case, the agent may not be able to reach the goal from the newly encountered state. Agents should be better prepared to handle such risky situations and avoid replanning in risky states. Hence, we consider replanning in states that are unsafe for deliberation as a negative side effect of planning with reduced models. While the negative side effects can be minimized by always using the full model, this defeats the purpose of using reduced models. The challenge is to plan with reduced models, but somehow account for the possibility of encountering risky situations. An agent should thus only replan in states that the user has approved as safe for replanning. To that end, we propose planning using a portfolio of reduced models, a planning paradigm that minimizes the negative side effects of planning using reduced models by alternating between different outcome selection approaches. We empirically demonstrate the effectiveness of our approach on three domains: an electric vehicle charging domain using real-world data from a university campus and two benchmark planning problems.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Reduced models of large Markov decision processes accelerate planning by considering a subset of outcomes for each state-action pair. This reduction in reachable states leads to replanning when the agent encounters states without a pre-computed action during plan execution. However, not all states are suitable for replanning. In the worst case, the agent may not be able to reach the goal from the newly encountered state. Agents should be better prepared to handle such risky situations and avoid replanning in risky states. Hence, we consider replanning in states that are unsafe for deliberation as a negative side effect of planning with reduced models. While the negative side effects can be minimized by always using the full model, this defeats the purpose of using reduced models. The challenge is to plan with reduced models, but somehow account for the possibility of encountering risky situations. An agent should thus only replan in states that the user has approved as safe for replanning. To that end, we propose planning using a portfolio of reduced models, a planning paradigm that minimizes the negative side effects of planning using reduced models by alternating between different outcome selection approaches. We empirically demonstrate the effectiveness of our approach on three domains: an electric vehicle charging domain using real-world data from a university campus and two benchmark planning problems. |
Abhishek Dwaraki; Richard G Freedman; Shlomo Zilberstein; Tilman Wolf Using Natural Language Constructs and Concepts to Aid Network Management Conference Proceedings of the International Conference on Computing, Networking and Communications, Honolulu, Hawaii, 2019. @conference{SZ:DFZWiccnc19,
title = {Using Natural Language Constructs and Concepts to Aid Network Management},
author = {Abhishek Dwaraki and Richard G Freedman and Shlomo Zilberstein and Tilman Wolf},
url = {https://doi.org/10.1109/ICCNC.2019.8685639},
doi = {10.1109/ICCNC.2019.8685639},
year = {2019},
date = {2019-01-01},
booktitle = {Proceedings of the International Conference on Computing, Networking and Communications},
pages = {802--808},
address = {Honolulu, Hawaii},
abstract = {The increasing complexity of networks together with technological trends that allow for fine-grained control and programmability have made network management a pressing challenge. In this work, we propose to harness the vast amounts of network management data that are available from different sources in an automated system that can infer context and semantics. We present an argument for a Network Processing Language that is based on the ideas of natural language processing. Our approach shows how concepts, such as collocations, can be applied to network management data. We demonstrate the effectiveness of our approach to detect route prefix and sub-prefix hijacks. This work presents one step toward effectively using automated tools for network management in complex, programmable networks.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
The increasing complexity of networks together with technological trends that allow for fine-grained control and programmability have made network management a pressing challenge. In this work, we propose to harness the vast amounts of network management data that are available from different sources in an automated system that can infer context and semantics. We present an argument for a Network Processing Language that is based on the ideas of natural language processing. Our approach shows how concepts, such as collocations, can be applied to network management data. We demonstrate the effectiveness of our approach to detect route prefix and sub-prefix hijacks. This work presents one step toward effectively using automated tools for network management in complex, programmable networks. |
Luis Pineda; Shlomo Zilberstein Soft Labeling in Stochastic Shortest Path Problems Conference Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), Montreal, Quebec, CA, 2019. @conference{SZ:PZaamas19,
title = {Soft Labeling in Stochastic Shortest Path Problems},
author = {Luis Pineda and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/PZaamas19.pdf},
year = {2019},
date = {2019-01-01},
booktitle = {Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS)},
pages = {467--475},
address = {Montreal, Quebec, CA},
abstract = {The Stochastic Shortest Path (SSP) is an established model for goal-directed probabilistic planning. Despite its broad applicability, wide adoption of the model has been impaired by its high computational complexity. Efforts to address this challenge have produced promising algorithms that leverage two popular mechanisms: labeling and short-sightedness. The resulting algorithms can generate near- optimal solutions much faster than optimal solvers, albeit at the cost of poor theoretical guarantees. In this work, we introduce a generalization of labeling, called soft labeling, which results in a framework that encompasses a wide spectrum of efficient labeling algorithms, and offers better theoretical guarantees than existing short-sighted labeling approaches. We also propose a novel instantiation of this framework, the SOFT-FLARES algorithm, which achieves state-of-the-art performance on a diverse set of benchmarks.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
The Stochastic Shortest Path (SSP) is an established model for goal-directed probabilistic planning. Despite its broad applicability, wide adoption of the model has been impaired by its high computational complexity. Efforts to address this challenge have produced promising algorithms that leverage two popular mechanisms: labeling and short-sightedness. The resulting algorithms can generate near- optimal solutions much faster than optimal solvers, albeit at the cost of poor theoretical guarantees. In this work, we introduce a generalization of labeling, called soft labeling, which results in a framework that encompasses a wide spectrum of efficient labeling algorithms, and offers better theoretical guarantees than existing short-sighted labeling approaches. We also propose a novel instantiation of this framework, the SOFT-FLARES algorithm, which achieves state-of-the-art performance on a diverse set of benchmarks. |
Kyle Hollins Wray; Shlomo Zilberstein Policy Networks: A Framework for Scalable Integration of Multiple Decision-Making Models (Extended Abstract) Conference Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), Montreal, Quebec, CA, 2019. @conference{SZ:WZaamas19,
title = {Policy Networks: A Framework for Scalable Integration of Multiple Decision-Making Models (Extended Abstract)},
author = {Kyle Hollins Wray and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/WZaamas19.pdf},
year = {2019},
date = {2019-01-01},
booktitle = {Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS)},
pages = {2270--2272},
address = {Montreal, Quebec, CA},
abstract = {Policy networks are graphical models that integrate decision-making models. They allow for multiple Markov decision processes (MDPs) that describe distinct focused aspects of a domain to work in harmony to solve a large-scale problem. This paper defines policy networks and shows how they are able to naturally generalize many previous models, such as options and constrained MDPs.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Policy networks are graphical models that integrate decision-making models. They allow for multiple Markov decision processes (MDPs) that describe distinct focused aspects of a domain to work in harmony to solve a large-scale problem. This paper defines policy networks and shows how they are able to naturally generalize many previous models, such as options and constrained MDPs. |
Kyle Hollins Wray; Shlomo Zilberstein Generalized Controllers in POMDP Decision-Making Conference Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Montreal, Quebec, CA, 2019. @conference{SZ:WZicra19,
title = {Generalized Controllers in POMDP Decision-Making},
author = {Kyle Hollins Wray and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/WZicra19.pdf},
year = {2019},
date = {2019-01-01},
booktitle = {Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)},
pages = {7166--7172},
address = {Montreal, Quebec, CA},
abstract = {We present a general policy formulation for partially observable Markov decision processes (POMDPs) called controller family policies that may be used as a framework to facilitate the design of new policy forms. We prove how modern approximate policy forms: point-based, finite state controller (FSC), and belief compression, are instances of this family of generalized controller policies. Our analysis provides a deeper understanding of the POMDP model and suggests novel ways to design POMDP solutions that can combine the benefits of different state-of-the-art methods. We illustrate this capability by creating a new customized POMDP policy form called the belief-integrated FSC (BI-FSC) tailored to overcome the shortcomings of a state-of-the-art algorithm that uses non-linear programming (NLP). Specifically, experiments show that for NLP the BI-FSC offers improved performance over a vanilla FSC-based policy form on benchmark domains. Furthermore, we demonstrate the BI-FSC's execution on a real robot navigating in a maze environment. Results confirm the value of using the controller family policy as a framework to design customized policies in POMDP robotic solutions.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
We present a general policy formulation for partially observable Markov decision processes (POMDPs) called controller family policies that may be used as a framework to facilitate the design of new policy forms. We prove how modern approximate policy forms: point-based, finite state controller (FSC), and belief compression, are instances of this family of generalized controller policies. Our analysis provides a deeper understanding of the POMDP model and suggests novel ways to design POMDP solutions that can combine the benefits of different state-of-the-art methods. We illustrate this capability by creating a new customized POMDP policy form called the belief-integrated FSC (BI-FSC) tailored to overcome the shortcomings of a state-of-the-art algorithm that uses non-linear programming (NLP). Specifically, experiments show that for NLP the BI-FSC offers improved performance over a vanilla FSC-based policy form on benchmark domains. Furthermore, we demonstrate the BI-FSC's execution on a real robot navigating in a maze environment. Results confirm the value of using the controller family policy as a framework to design customized policies in POMDP robotic solutions. |
Sarah Keren; Luis Enrique Pineda; Avigdor Gal; Erez Karpas; Shlomo Zilberstein Responsive Planning and Recognition for Closed-Loop Interaction Conference Proceedings of the 29th International Conference on Automated Planning and Scheduling (ICAPS), Berkeley, CA, 2019. @conference{SZ:KPGKZicaps19,
title = {Responsive Planning and Recognition for Closed-Loop Interaction},
author = {Sarah Keren and Luis Enrique Pineda and Avigdor Gal and Erez Karpas and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/KPGKZicaps19.pdf},
year = {2019},
date = {2019-01-01},
booktitle = {Proceedings of the 29th International Conference on Automated Planning and Scheduling (ICAPS)},
pages = {246--254},
address = {Berkeley, CA},
abstract = {Given an environment, the utility measure of the agents acting within it, a set of possible environment modifications, and a description of design constraints, the objective of equi-reward utility maximizing design (ER-UMD) is to find a valid sequence of modifications to apply to the environment in order to maximize agent utility. To efficiently traverse the typically large space of possible design options, we use heuristic search and propose new heuristics, which relax the design process; instead of computing the value achieved by a single modification, we use a dominating modification guaranteed to be at least as beneficial. The proposed technique enables heuristic caching for similar nodes thereby saving computational overhead. We specify sufficient conditions under which our approach is guaranteed to produce admissible estimates, and describe a range of models that comply with these requirements. Also, for models with lifted representations of environment modifications, we provide simple methods to automatically generate dominating modifications. We evaluate our approach on a range of stochastic settings for which our heuristic is admissible. We demonstrate its efficiency by comparing it to a previously suggested heuristic that employs a relaxation of the environment, and to a compilation from ER-UMD to planning.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Given an environment, the utility measure of the agents acting within it, a set of possible environment modifications, and a description of design constraints, the objective of equi-reward utility maximizing design (ER-UMD) is to find a valid sequence of modifications to apply to the environment in order to maximize agent utility. To efficiently traverse the typically large space of possible design options, we use heuristic search and propose new heuristics, which relax the design process; instead of computing the value achieved by a single modification, we use a dominating modification guaranteed to be at least as beneficial. The proposed technique enables heuristic caching for similar nodes thereby saving computational overhead. We specify sufficient conditions under which our approach is guaranteed to produce admissible estimates, and describe a range of models that comply with these requirements. Also, for models with lifted representations of environment modifications, we provide simple methods to automatically generate dominating modifications. We evaluate our approach on a range of stochastic settings for which our heuristic is admissible. We demonstrate its efficiency by comparing it to a previously suggested heuristic that employs a relaxation of the environment, and to a compilation from ER-UMD to planning. |
Sandhya Saisubramanian; Kyle Hollins Wray; Luis Enrique Pineda; Shlomo Zilberstein Planning in Stochastic Environments with Goal Uncertainty Conference ICAPS Workshop on Planning and Robotics (PlanRob), Berkeley, CA, 2019. @conference{SZ:SWPZicaps19ws1,
title = {Planning in Stochastic Environments with Goal Uncertainty},
author = {Sandhya Saisubramanian and Kyle Hollins Wray and Luis Enrique Pineda and Shlomo Zilberstein},
year = {2019},
date = {2019-01-01},
booktitle = {ICAPS Workshop on Planning and Robotics (PlanRob)},
address = {Berkeley, CA},
abstract = {TBD.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
|
Sandhya Saisubramanian; Connor Basich; Shlomo Zilberstein; Claudia V Goldman The Value of Incorporating Social Preferences in Dynamic Ridesharing Conference ICAPS Workshop on Scheduling and Planning Applications (SPARK), Berkeley, CA, 2019. @conference{SZ:SBZGicaps19ws2,
title = {The Value of Incorporating Social Preferences in Dynamic Ridesharing},
author = {Sandhya Saisubramanian and Connor Basich and Shlomo Zilberstein and Claudia V Goldman},
year = {2019},
date = {2019-01-01},
booktitle = {ICAPS Workshop on Scheduling and Planning Applications (SPARK)},
address = {Berkeley, CA},
abstract = {TBD.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
|
Luis Enrique Pineda; Shlomo Zilberstein Probabilistic Planning with Reduced Models Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 65, pp. 271–306, 2019. @article{SZ:PZjair19,
title = {Probabilistic Planning with Reduced Models},
author = {Luis Enrique Pineda and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/PZjair19.pdf},
doi = {10.1613/jair.1.11569},
year = {2019},
date = {2019-01-01},
journal = {Journal of Artificial Intelligence Research (JAIR)},
volume = {65},
pages = {271--306},
abstract = {Reduced models are simplified versions of a given domain, designed to accelerate the planning process. Interest in reduced models has grown since the surprising success of determinization in the first international probabilistic planning competition, leading to the development of several enhanced determinization techniques. To address the drawbacks of previous determinization methods, we introduce a family of reduced models in which probabilistic outcomes are classified as one of two types: primary and exceptional. In each model that belongs to this family of reductions, primary outcomes can occur an unbounded number of times per trajectory, while exceptions can occur at most a finite number of times, specified by a parameter. Distinct reduced models are characterized by two parameters: the maximum number of primary outcomes per action, and the maximum number of occurrences of exceptions per trajectory. This family of reductions generalizes the well-known most-likely-outcome determinization approach, which includes one primary outcome per action and zero exceptional outcomes per plan. We present a framework to determine the benefits of planning with reduced models, and develop a continual planning approach that handles situations where the number of exceptions exceeds the specified bound during plan execution. Using this framework, we compare the performance of various reduced models and consider the challenge of generating good ones automatically. We show that each one of the dimensions--allowing more than one primary outcome or planning for some limited number of exceptions--could improve performance relative to standard determinization. The results place previous work on determinization in a broader context and lay the foundation for a systematic exploration of the space of model reductions.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Reduced models are simplified versions of a given domain, designed to accelerate the planning process. Interest in reduced models has grown since the surprising success of determinization in the first international probabilistic planning competition, leading to the development of several enhanced determinization techniques. To address the drawbacks of previous determinization methods, we introduce a family of reduced models in which probabilistic outcomes are classified as one of two types: primary and exceptional. In each model that belongs to this family of reductions, primary outcomes can occur an unbounded number of times per trajectory, while exceptions can occur at most a finite number of times, specified by a parameter. Distinct reduced models are characterized by two parameters: the maximum number of primary outcomes per action, and the maximum number of occurrences of exceptions per trajectory. This family of reductions generalizes the well-known most-likely-outcome determinization approach, which includes one primary outcome per action and zero exceptional outcomes per plan. We present a framework to determine the benefits of planning with reduced models, and develop a continual planning approach that handles situations where the number of exceptions exceeds the specified bound during plan execution. Using this framework, we compare the performance of various reduced models and consider the challenge of generating good ones automatically. We show that each one of the dimensions--allowing more than one primary outcome or planning for some limited number of exceptions--could improve performance relative to standard determinization. The results place previous work on determinization in a broader context and lay the foundation for a systematic exploration of the space of model reductions. |
Sandhya Saisubramanian; Kyle Hollins Wray; Luis Enrique Pineda; Shlomo Zilberstein Planning in Stochastic Environments with Goal Uncertainty Conference Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 2019. @conference{SZ:SWPZiros19,
title = {Planning in Stochastic Environments with Goal Uncertainty},
author = {Sandhya Saisubramanian and Kyle Hollins Wray and Luis Enrique Pineda and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/SWPZiros19.pdf},
year = {2019},
date = {2019-01-01},
booktitle = {Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
address = {Macau, China},
abstract = {We present the Goal Uncertain Stochastic Shortest Path (GUSSP) problem -- a general framework to model path planning and decision making in stochastic environments with goal uncertainty. The framework extends the stochastic shortest path (SSP) model to dynamic environments in which it is impossible to determine the exact goal states ahead of plan execution. GUSSPs introduce flexibility in goal specification by allowing a belief over possible goal configurations. The unique observations at potential goals helps the agent identify the true goal during plan execution. The partial observability is restricted to goals, facilitating the reduction to an SSP with a modified state space. We formally define a GUSSP and discuss its theoretical properties. We then propose an admissible heuristic that reduces the planning time using FLARES -- a start-of-the-art probabilistic planner. We also propose a determinization approach for solving this class of problems. Finally, we present empirical results on a search and rescue mobile robot and three other problem domains in simulation.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
We present the Goal Uncertain Stochastic Shortest Path (GUSSP) problem -- a general framework to model path planning and decision making in stochastic environments with goal uncertainty. The framework extends the stochastic shortest path (SSP) model to dynamic environments in which it is impossible to determine the exact goal states ahead of plan execution. GUSSPs introduce flexibility in goal specification by allowing a belief over possible goal configurations. The unique observations at potential goals helps the agent identify the true goal during plan execution. The partial observability is restricted to goals, facilitating the reduction to an SSP with a modified state space. We formally define a GUSSP and discuss its theoretical properties. We then propose an admissible heuristic that reduces the planning time using FLARES -- a start-of-the-art probabilistic planner. We also propose a determinization approach for solving this class of problems. Finally, we present empirical results on a search and rescue mobile robot and three other problem domains in simulation. |
Sandhya Saisubramanian; Shlomo Zilberstein Adaptive Outcome Selection for Planning With Reduced Models Conference Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 2019. @conference{SZ:SZiros19,
title = {Adaptive Outcome Selection for Planning With Reduced Models},
author = {Sandhya Saisubramanian and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/SZiros19.pdf},
year = {2019},
date = {2019-01-01},
booktitle = {Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
address = {Macau, China},
abstract = {Reduced models allow autonomous robots to cope with the complexity of planning in stochastic environments by simplifying the model and reducing its accuracy. The solution quality of a reduced model depends on its fidelity. We present 0/1 reduced model that selectively improves model fidelity in certain states by switching between using a simplified deterministic model and the full model, without significantly compromising the run time gains. We measure the reduction impact for a reduced model based on the values of the ignored outcomes and use this as a heuristic for outcome selection. Finally, we present empirical results of our approach on three different domains, including an electric vehicle charging problem using real-world data from a university campus.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Reduced models allow autonomous robots to cope with the complexity of planning in stochastic environments by simplifying the model and reducing its accuracy. The solution quality of a reduced model depends on its fidelity. We present 0/1 reduced model that selectively improves model fidelity in certain states by switching between using a simplified deterministic model and the full model, without significantly compromising the run time gains. We measure the reduction impact for a reduced model based on the values of the ignored outcomes and use this as a heuristic for outcome selection. Finally, we present empirical results of our approach on three different domains, including an electric vehicle charging problem using real-world data from a university campus. |
Justin Svegliato; Kyle Hollins Wray; Stefan J Witwicki; Joydeep Biswas; Shlomo Zilberstein Belief Space Metareasoning for Exception Recovery Conference Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 2019. @conference{SZ:SWWBZiros19,
title = {Belief Space Metareasoning for Exception Recovery},
author = {Justin Svegliato and Kyle Hollins Wray and Stefan J Witwicki and Joydeep Biswas and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/SWWBZiros19.pdf},
year = {2019},
date = {2019-01-01},
booktitle = {Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
address = {Macau, China},
abstract = {Due to the complexity of the real world, autonomous systems use decision-making models that rely on simplifying assumptions to make them computationally tractable and feasible to design. However, since these limited representations cannot fully capture the domain of operation, an autonomous system may encounter unanticipated scenarios that cannot be resolved effectively. We first formally introduce an introspective autonomous system that uses belief space metareasoning to recover from exceptions by interleaving a main decision process with a set of exception handlers. We then apply introspective autonomy to autonomous driving. Finally, we demonstrate that an introspective autonomous vehicle is effective in simulation and on a fully operational prototype.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Due to the complexity of the real world, autonomous systems use decision-making models that rely on simplifying assumptions to make them computationally tractable and feasible to design. However, since these limited representations cannot fully capture the domain of operation, an autonomous system may encounter unanticipated scenarios that cannot be resolved effectively. We first formally introduce an introspective autonomous system that uses belief space metareasoning to recover from exceptions by interleaving a main decision process with a set of exception handlers. We then apply introspective autonomy to autonomous driving. Finally, we demonstrate that an introspective autonomous vehicle is effective in simulation and on a fully operational prototype. |
Sandhya Saisubramanian; Connor Basich; Shlomo Zilberstein; Claudia V Goldman Satisfying Social Preferences in Ridesharing Services Conference Proceedings of the IEEE Intelligent Transportation Systems Conference (ITSC), Auckland, New Zealand, 2019. @conference{SZ:SBZGitsc19,
title = {Satisfying Social Preferences in Ridesharing Services},
author = {Sandhya Saisubramanian and Connor Basich and Shlomo Zilberstein and Claudia V Goldman},
url = {http://rbr.cs.umass.edu/shlomo/papers/SBZGitsc19.pdf},
year = {2019},
date = {2019-01-01},
booktitle = {Proceedings of the IEEE Intelligent Transportation Systems Conference (ITSC)},
pages = {3720--3725},
address = {Auckland, New Zealand},
abstract = {Dynamic ridesharing services (DRS) play a major role in improving the efficiency of urban transportation. User satisfaction in dynamic ridesharing is determined by multiple factors such as travel time, cost, and social compatibility with co-passengers. Existing DRS optimize profit by maximizing the operational value for service providers or minimizing the travel time for users but they neglect the social experience of riders, which significantly influences the total value of the service to users. We propose DROPS, a dynamic ridesharing framework that factors the riders’ social preferences in the matching process so as to improve the quality of the trips formed. The trip formation is a multi-objective optimization that aims to maximize the operational value for the service provider, while simultaneously maximizing the value of the trip for the users. The user value is estimated based on compatibility between co-passengers and the ride time. We also present a real-time matching algorithm for trip formation. Finally, we evaluate our approach empirically using real-world taxi trips data, and a population model including social preferences based on user surveys. Our approach improves the user value and users’ social compatibility, without significantly affecting the vehicle miles for the service provider and travel time for users.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Dynamic ridesharing services (DRS) play a major role in improving the efficiency of urban transportation. User satisfaction in dynamic ridesharing is determined by multiple factors such as travel time, cost, and social compatibility with co-passengers. Existing DRS optimize profit by maximizing the operational value for service providers or minimizing the travel time for users but they neglect the social experience of riders, which significantly influences the total value of the service to users. We propose DROPS, a dynamic ridesharing framework that factors the riders’ social preferences in the matching process so as to improve the quality of the trips formed. The trip formation is a multi-objective optimization that aims to maximize the operational value for the service provider, while simultaneously maximizing the value of the trip for the users. The user value is estimated based on compatibility between co-passengers and the ride time. We also present a real-time matching algorithm for trip formation. Finally, we evaluate our approach empirically using real-world taxi trips data, and a population model including social preferences based on user surveys. Our approach improves the user value and users’ social compatibility, without significantly affecting the vehicle miles for the service provider and travel time for users. |
Kyle Hollins Wray; Stefan J Witwicki; Shlomo Zilberstein; Liam Pedersen Autonomous Vehicle Operational Management Control Miscellaneous 2019, (US Patent App. 16/472,573). @misc{SZ:WWZPpatent19a,
title = {Autonomous Vehicle Operational Management Control},
author = {Kyle Hollins Wray and Stefan J Witwicki and Shlomo Zilberstein and Liam Pedersen},
url = {https://patentimages.storage.googleapis.com/60/6e/6e/91882eea0fc0e7/US20190329771A1.pdf},
year = {2019},
date = {2019-01-01},
publisher = {Google Patents},
abstract = {Autonomous vehicle operational management may include traversing, by an autonomous vehicle, a vehicle transportation network. Traversing the vehicle transportation network may include receiving, from a sensor of the autonomous vehicle, sensor information corresponding to an external object within a defined distance of the autonomous vehicle, identifying a distinct vehicle operational scenario in response to receiving the sensor information, instantiating a scenario-specific operational control evaluation module instance, wherein the scenario-specific operational control evaluation module instance is an instance of a scenario-specific operational control evaluation module modeling the distinct vehicle operational scenario, receiving a candidate vehicle control action from the scenario-specific operational control evaluation module instance, and traversing a portion of the vehicle transportation network based on the candidate vehicle control action.},
note = {US Patent App. 16/472,573},
keywords = {},
pubstate = {published},
tppubtype = {misc}
}
Autonomous vehicle operational management may include traversing, by an autonomous vehicle, a vehicle transportation network. Traversing the vehicle transportation network may include receiving, from a sensor of the autonomous vehicle, sensor information corresponding to an external object within a defined distance of the autonomous vehicle, identifying a distinct vehicle operational scenario in response to receiving the sensor information, instantiating a scenario-specific operational control evaluation module instance, wherein the scenario-specific operational control evaluation module instance is an instance of a scenario-specific operational control evaluation module modeling the distinct vehicle operational scenario, receiving a candidate vehicle control action from the scenario-specific operational control evaluation module instance, and traversing a portion of the vehicle transportation network based on the candidate vehicle control action. |
Feng Wu; Shlomo Zilberstein; Nicholas R Jennings Stochastic Multi-agent Planning with Partial State Models Conference Proceedings of the First International Conference on Distributed Artificial Intelligence (DAI), Beijing, China, 2019. @conference{SZ:WZJdai19,
title = {Stochastic Multi-agent Planning with Partial State Models},
author = {Feng Wu and Shlomo Zilberstein and Nicholas R Jennings},
url = {http://rbr.cs.umass.edu/shlomo/papers/WZJdai19.pdf},
doi = {10.1145/3356464.3357699},
year = {2019},
date = {2019-01-01},
booktitle = {Proceedings of the First International Conference on Distributed Artificial Intelligence (DAI)},
pages = {1-8},
address = {Beijing, China},
abstract = {People who observe a multi-agent team can often provide valuable information to the agents based on their superior cognitive abilities to interpret sequences of observations and assess the overall situation. The knowledge they possess is often difficult to be fully represented using a formal model such as DEC-POMDP. To deal with this, we propose an extension of the DEC-POMDP that allows states to be partially specified and benefit from expert knowledge, while preserving the partial observability and decentralized operation of the agents. In particular, we present an algorithm for computing policies based on history samples that include human labeled data in the form of reward reshaping. We also consider ways to minimize the burden on human experts during the labeling phase. The results offer the first approach to incorporating human knowledge in such complex multi-agent settings. We demonstrate the benefits of our approach using a disaster recovery scenario, comparing it to several baseline approaches.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
People who observe a multi-agent team can often provide valuable information to the agents based on their superior cognitive abilities to interpret sequences of observations and assess the overall situation. The knowledge they possess is often difficult to be fully represented using a formal model such as DEC-POMDP. To deal with this, we propose an extension of the DEC-POMDP that allows states to be partially specified and benefit from expert knowledge, while preserving the partial observability and decentralized operation of the agents. In particular, we present an algorithm for computing policies based on history samples that include human labeled data in the form of reward reshaping. We also consider ways to minimize the burden on human experts during the labeling phase. The results offer the first approach to incorporating human knowledge in such complex multi-agent settings. We demonstrate the benefits of our approach using a disaster recovery scenario, comparing it to several baseline approaches. |
2018
|
Timothy J Wright; Ravi Agrawal; Siby Samuel; Yuhua Wang; Shlomo Zilberstein; Donald L Fisher Effective Cues for Accelerating Young Drivers' Time to Transfer Control Following a Period of Conditional Automation Journal Article In: Accident Analysis and Prevention, vol. 116, pp. 14–20, 2018. @article{SZ:WASWZFaap18,
title = {Effective Cues for Accelerating Young Drivers' Time to Transfer Control Following a Period of Conditional Automation},
author = {Timothy J Wright and Ravi Agrawal and Siby Samuel and Yuhua Wang and Shlomo Zilberstein and Donald L Fisher},
url = {https://www.sciencedirect.com/science/article/abs/pii/S0001457517303615},
doi = {10.1016/j.aap.2017.10.005},
year = {2018},
date = {2018-01-01},
journal = {Accident Analysis and Prevention},
volume = {116},
pages = {14--20},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
|
Feng Wu; Shlomo Zilberstein; Xiaoping Chen Privacy-Preserving Policy Iteration for Decentralized POMDPs Conference Proceedings of the 32nd Conference on Artificial Intelligence (AAAI), New Orleans, Louisiana, 2018. @conference{SZ:WZCaaai18,
title = {Privacy-Preserving Policy Iteration for Decentralized POMDPs},
author = {Feng Wu and Shlomo Zilberstein and Xiaoping Chen},
url = {http://rbr.cs.umass.edu/shlomo/papers/WZCaaai18.pdf},
year = {2018},
date = {2018-01-01},
booktitle = {Proceedings of the 32nd Conference on Artificial Intelligence (AAAI)},
pages = {4759--4766},
address = {New Orleans, Louisiana},
abstract = {We propose the first privacy-preserving approach to address the privacy issues that arise in multi-agent planning problems modeled as a Dec-POMDP. Our solution is a distributed message-passing algorithm based on trials, where the agents' policies are optimized using the cross-entropy method. In our algorithm, the agents' private information is protected using a public-key homomorphic cryptosystem. We prove the correctness of our algorithm and analyze its complexity in terms of message passing and encryption/decryption operations. Furthermore, we analyze several privacy aspects of our algorithm and show that it can preserve the agent privacy of non-neighbors, model privacy, and decision privacy. Our experimental results on several common Dec-POMDP bench- mark problems confirm the effectiveness of our approach.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
We propose the first privacy-preserving approach to address the privacy issues that arise in multi-agent planning problems modeled as a Dec-POMDP. Our solution is a distributed message-passing algorithm based on trials, where the agents' policies are optimized using the cross-entropy method. In our algorithm, the agents' private information is protected using a public-key homomorphic cryptosystem. We prove the correctness of our algorithm and analyze its complexity in terms of message passing and encryption/decryption operations. Furthermore, we analyze several privacy aspects of our algorithm and show that it can preserve the agent privacy of non-neighbors, model privacy, and decision privacy. Our experimental results on several common Dec-POMDP bench- mark problems confirm the effectiveness of our approach. |
Kyle Hollins Wray; Akshat Kumar; Shlomo Zilberstein Integrated Cooperation and Competition in Multi-Agent Decision-Making Conference Proceedings of the 32nd Conference on Artificial Intelligence (AAAI), New Orleans, Louisiana, 2018. @conference{SZ:WKZaaai18,
title = {Integrated Cooperation and Competition in Multi-Agent Decision-Making},
author = {Kyle Hollins Wray and Akshat Kumar and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/WKZaaai18.pdf},
year = {2018},
date = {2018-01-01},
booktitle = {Proceedings of the 32nd Conference on Artificial Intelligence (AAAI)},
pages = {4751--4758},
address = {New Orleans, Louisiana},
abstract = {Observing that many real-world sequential decision problems are not purely cooperative or purely competitive, we propose a new model--cooperative-competitive process (CCP)--that can simultaneously encapsulate both cooperation and competition. First, we discuss how the CCP model bridges the gap between cooperative and competitive models. Next, we investigate a specific class of group-dominant CCPs, in which agents cooperate to achieve a common goal as their primary objective, while also pursuing individual goals as a secondary objective. We provide an approximate solution for this class of problems that leverages stochastic finite-state controllers. The model is grounded in two multi-robot meeting and box-pushing domains that are implemented in simulation and demonstrated on two real robots.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Observing that many real-world sequential decision problems are not purely cooperative or purely competitive, we propose a new model--cooperative-competitive process (CCP)--that can simultaneously encapsulate both cooperation and competition. First, we discuss how the CCP model bridges the gap between cooperative and competitive models. Next, we investigate a specific class of group-dominant CCPs, in which agents cooperate to achieve a common goal as their primary objective, while also pursuing individual goals as a secondary objective. We provide an approximate solution for this class of problems that leverages stochastic finite-state controllers. The model is grounded in two multi-robot meeting and box-pushing domains that are implemented in simulation and demonstrated on two real robots. |
Richard G Freedman; Yi Ren Fung; Roman Ganchin; Shlomo Zilberstein Towards Quicker Probabilistic Recognition with Multiple Goal Heuristic Search Conference AAAI Workshop on Plan, Activity, and Intent Recognition (PAIR), New Orleans, Louisiana, 2018. @conference{SZ:FFGZpair18,
title = {Towards Quicker Probabilistic Recognition with Multiple Goal Heuristic Search},
author = {Richard G Freedman and Yi Ren Fung and Roman Ganchin and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/FFGZpair18.pdf},
year = {2018},
date = {2018-01-01},
booktitle = {AAAI Workshop on Plan, Activity, and Intent Recognition (PAIR)},
pages = {601--606},
address = {New Orleans, Louisiana},
abstract = {Referred to as an approach for either plan or goal recognition, the original method proposed by Ramirez and Geffner introduced a domain-based approach that did not need a library containing specific plan instances. This introduced a more generalizable means of representing tasks to be recognized, but was also very slow due to its need to run simulations via multiple executions of an off-the-shelf classical planner. Several variations have since been proposed for quicker recognition, but each one uses a drastically different approach that must sacrifice other qualities useful for processing the recognition results in more complex systems. We present work in progress that takes advantage of the shared state space between planner executions to perform multiple goal heuristic search. This single execution of a planner will potentially speed up the recognition process using the original method, which also maintains the sacrificed properties and improves some of the assumptions made by Ramirez and Geffner.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Referred to as an approach for either plan or goal recognition, the original method proposed by Ramirez and Geffner introduced a domain-based approach that did not need a library containing specific plan instances. This introduced a more generalizable means of representing tasks to be recognized, but was also very slow due to its need to run simulations via multiple executions of an off-the-shelf classical planner. Several variations have since been proposed for quicker recognition, but each one uses a drastically different approach that must sacrifice other qualities useful for processing the recognition results in more complex systems. We present work in progress that takes advantage of the shared state space between planner executions to perform multiple goal heuristic search. This single execution of a planner will potentially speed up the recognition process using the original method, which also maintains the sacrificed properties and improves some of the assumptions made by Ramirez and Geffner. |
Richard G Freedman; Shlomo Zilberstein Roles that Plan, Activity, and Intent Recognition with Planning Can Play in Games Conference AAAI Workshop on Knowledge Extraction from Games (KEG), New Orleans, Louisiana, 2018. @conference{SZ:FZkeg18,
title = {Roles that Plan, Activity, and Intent Recognition with Planning Can Play in Games},
author = {Richard G Freedman and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/FZkeg18.pdf},
year = {2018},
date = {2018-01-01},
booktitle = {AAAI Workshop on Knowledge Extraction from Games (KEG)},
address = {New Orleans, Louisiana},
abstract = {Planning is one of the oldest areas of research within artificial intelligence, studying the selection of actions for accomplishing goals. The more recently established areas of plan, activity, and intent recognition instead study an agent's behavior and task(s) given observations of its chosen actions. While these areas have been independently studied and applied to games in the past for both understanding player behavior and developing game characters, the potential for their integration presents even more opportunities via adaptive interaction with the player. In this manuscript, we discuss recent research on the integration of these areas and investigate potential uses for such integrated systems in games.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Planning is one of the oldest areas of research within artificial intelligence, studying the selection of actions for accomplishing goals. The more recently established areas of plan, activity, and intent recognition instead study an agent's behavior and task(s) given observations of its chosen actions. While these areas have been independently studied and applied to games in the past for both understanding player behavior and developing game characters, the potential for their integration presents even more opportunities via adaptive interaction with the player. In this manuscript, we discuss recent research on the integration of these areas and investigate potential uses for such integrated systems in games. |
Siddharth Srivastava; Nishant Desai; Richard G Freedman; Shlomo Zilberstein An Anytime Algorithm for Task and Motion MDPs Conference ICAPS Workshop on Planning and Robotics (PlanRob), Delft, The Netherlands, 2018. @conference{SZ:SDFZicaps18ws1,
title = {An Anytime Algorithm for Task and Motion MDPs},
author = {Siddharth Srivastava and Nishant Desai and Richard G Freedman and Shlomo Zilberstein},
url = {http://arxiv.org/abs/1802.05835},
year = {2018},
date = {2018-01-01},
booktitle = {ICAPS Workshop on Planning and Robotics (PlanRob)},
address = {Delft, The Netherlands},
abstract = {Integrated task and motion planning has emerged as a challenging problem in sequential decision making, where a robot needs to compute high-level strategy and low-level motion plans for solving complex tasks. While high-level strategies require decision making over longer time-horizons and scales, their feasibility depends on low-level constraints based upon the geometries and continuous dynamics of the environment. The hybrid nature of this problem makes it difficult to scale; most existing approaches focus on deterministic, fully observable scenarios. We present a new approach where the high-level decision problem occurs in a stochastic setting and can be modeled as a Markov decision process. In contrast to prior efforts, we show that complete MDP policies, or contingent behaviors, can be computed effectively in an anytime fashion. Our algorithm continuously improves the quality of the solution and is guaranteed to be probabilistically complete. We evaluate the performance of our approach on a challenging, realistic test problem: autonomous aircraft inspection. Our results show that we can effectively compute consistent task and motion policies for the most likely execution-time outcomes using only a fraction of the computation required to develop the complete task and motion policy.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Integrated task and motion planning has emerged as a challenging problem in sequential decision making, where a robot needs to compute high-level strategy and low-level motion plans for solving complex tasks. While high-level strategies require decision making over longer time-horizons and scales, their feasibility depends on low-level constraints based upon the geometries and continuous dynamics of the environment. The hybrid nature of this problem makes it difficult to scale; most existing approaches focus on deterministic, fully observable scenarios. We present a new approach where the high-level decision problem occurs in a stochastic setting and can be modeled as a Markov decision process. In contrast to prior efforts, we show that complete MDP policies, or contingent behaviors, can be computed effectively in an anytime fashion. Our algorithm continuously improves the quality of the solution and is guaranteed to be probabilistically complete. We evaluate the performance of our approach on a challenging, realistic test problem: autonomous aircraft inspection. Our results show that we can effectively compute consistent task and motion policies for the most likely execution-time outcomes using only a fraction of the computation required to develop the complete task and motion policy. |
Sarah Keren; Luis Enrique Pineda; Avigdor Gal; Erez Karpas; Shlomo Zilberstein Relaxed Modification Heuristics for Equi-Reward Utility Maximizing Design Conference ICAPS Workshop on Heuristics and Search for Domain-Independent Planning (HSDIP), Delft, The Netherlands, 2018. @conference{SZ:SDFZicaps18ws2,
title = {Relaxed Modification Heuristics for Equi-Reward Utility Maximizing Design},
author = {Sarah Keren and Luis Enrique Pineda and Avigdor Gal and Erez Karpas and Shlomo Zilberstein},
year = {2018},
date = {2018-01-01},
booktitle = {ICAPS Workshop on Heuristics and Search for Domain-Independent Planning (HSDIP)},
address = {Delft, The Netherlands},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
|
Sandhya Saisubramanian; Shlomo Zilberstein; Prashant J Shenoy Planning Using a Portfolio of Reduced Models Conference Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), Stockholm, Sweden, 2018. @conference{SZ:SZSaamas18,
title = {Planning Using a Portfolio of Reduced Models},
author = {Sandhya Saisubramanian and Shlomo Zilberstein and Prashant J Shenoy},
url = {http://rbr.cs.umass.edu/shlomo/papers/SZSaamas18.pdf},
year = {2018},
date = {2018-01-01},
booktitle = {Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS)},
pages = {2057--2059},
address = {Stockholm, Sweden},
abstract = {Existing reduced model techniques simplify a problem by applying a uniform principle to reduce the number of considered outcomes for all state-action pairs. It is non-trivial to identify which outcome selection principle will work well across all problem instances in a domain. We aim to create reduced models that yield near-optimal solutions, without compromising the run time gains of using a reduced model. First, we introduce planning using a portfolio of reduced models, a framework that provides flexibility in the reduced model formulation by using a portfolio of outcome selection principles. Second, we propose planning using cost adjustment, a technique that improves the solution quality by accounting for the outcomes ignored in the reduced model. Empirical evaluation of these techniques confirm their effectiveness in several domains.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Existing reduced model techniques simplify a problem by applying a uniform principle to reduce the number of considered outcomes for all state-action pairs. It is non-trivial to identify which outcome selection principle will work well across all problem instances in a domain. We aim to create reduced models that yield near-optimal solutions, without compromising the run time gains of using a reduced model. First, we introduce planning using a portfolio of reduced models, a framework that provides flexibility in the reduced model formulation by using a portfolio of outcome selection principles. Second, we propose planning using cost adjustment, a technique that improves the solution quality by accounting for the outcomes ignored in the reduced model. Empirical evaluation of these techniques confirm their effectiveness in several domains. |
Sandhya Saisubramanian; Shlomo Zilberstein Safe Reduced Models for Probabilistic Planning Conference ICML/IJCAI/AAMAS Workshop on Planning and Learning (PAL), Stockholm, Sweden, 2018. @conference{SZ:SZijcaiPLW18,
title = {Safe Reduced Models for Probabilistic Planning},
author = {Sandhya Saisubramanian and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/SZijcaiPLW18.pdf},
year = {2018},
date = {2018-01-01},
booktitle = {ICML/IJCAI/AAMAS Workshop on Planning and Learning (PAL)},
address = {Stockholm, Sweden},
abstract = {Reduced models allow autonomous agents to cope with the complexity of planning under uncertainty by reducing the accuracy of the model. However, the solution quality of a reduced model varies as the model fidelity changes. We present planning using a portfolio of reduced models with cost adjustments, a framework to increase the safety of a reduced model by selectively improving its fidelity in certain states, without significantly compromising runtime. Our framework provides the flexibility to create reduced models with different levels of de- tail using a portfolio, and a means to account for the ignored details by adjusting the actions costs in the reduced model. We show the conditions under which cost adjustments achieve optimal action selection and describe how to use cost adjustments as a heuristic for choosing outcome selection principles in a portfolio. Finally, we present empirical results of our approach on three domains that includes an electric vehicle charging problem using real-world data from a university campus.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Reduced models allow autonomous agents to cope with the complexity of planning under uncertainty by reducing the accuracy of the model. However, the solution quality of a reduced model varies as the model fidelity changes. We present planning using a portfolio of reduced models with cost adjustments, a framework to increase the safety of a reduced model by selectively improving its fidelity in certain states, without significantly compromising runtime. Our framework provides the flexibility to create reduced models with different levels of de- tail using a portfolio, and a means to account for the ignored details by adjusting the actions costs in the reduced model. We show the conditions under which cost adjustments achieve optimal action selection and describe how to use cost adjustments as a heuristic for choosing outcome selection principles in a portfolio. Finally, we present empirical results of our approach on three domains that includes an electric vehicle charging problem using real-world data from a university campus. |
Justin Svegliato; Kyle Hollins Wray; Shlomo Zilberstein Meta-Level Control of Anytime Algorithms with Online Performance Prediction Conference Proceedings of the 27th International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 2018. @conference{SZ:SWZijcai18,
title = {Meta-Level Control of Anytime Algorithms with Online Performance Prediction},
author = {Justin Svegliato and Kyle Hollins Wray and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/SWZijcai18.pdf},
doi = {10.24963/ijcai.2018/208},
year = {2018},
date = {2018-01-01},
booktitle = {Proceedings of the 27th International Joint Conference on Artificial Intelligence},
pages = {1499--1505},
address = {Stockholm, Sweden},
abstract = {Anytime algorithms enable intelligent systems to trade computation time with solution quality. To exploit this crucial ability in real-time decision-making, the system must decide when to interrupt the anytime algorithm and act on the current solution. Existing meta-level control techniques, how- ever, address this problem by relying on significant offline work that diminishes their practical utility and accuracy. We formally introduce an online performance prediction framework that enables meta- level control to adapt to each instance of a problem without any preprocessing. Using this framework, we then present a meta-level control technique and two stopping conditions. Finally, we show that our approach outperforms existing techniques that re- quire substantial offline work. The result is efficient nonmyopic meta-level control that reduces the overhead and increases the benefits of using any- time algorithms in intelligent systems.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Anytime algorithms enable intelligent systems to trade computation time with solution quality. To exploit this crucial ability in real-time decision-making, the system must decide when to interrupt the anytime algorithm and act on the current solution. Existing meta-level control techniques, how- ever, address this problem by relying on significant offline work that diminishes their practical utility and accuracy. We formally introduce an online performance prediction framework that enables meta- level control to adapt to each instance of a problem without any preprocessing. Using this framework, we then present a meta-level control technique and two stopping conditions. Finally, we show that our approach outperforms existing techniques that re- quire substantial offline work. The result is efficient nonmyopic meta-level control that reduces the overhead and increases the benefits of using any- time algorithms in intelligent systems. |
Justin Svegliato; Sam Witty; Amir Houmansadr; Shlomo Zilberstein Belief-Space Planning for Automated Malware Defense Conference IJCAI/ECAI Workshop on AI for Internet of Things (AI4IoT), Stockholm, Sweden, 2018. @conference{SZ:SWHZijcaiAI4IoT18,
title = {Belief-Space Planning for Automated Malware Defense},
author = {Justin Svegliato and Sam Witty and Amir Houmansadr and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/SWHZijcaiAI4IoT18.pdf},
year = {2018},
date = {2018-01-01},
booktitle = {IJCAI/ECAI Workshop on AI for Internet of Things (AI4IoT)},
address = {Stockholm, Sweden},
abstract = {Malware detection and response is critical to ensuring information security across a wide range of devices. There have been few attempts, however, to develop security systems that exploit the benefits of different malware detection techniques. We formally introduce an automated malware defense framework and represent it as a belief-space planning problem that optimally reduces the impact on the performance of a system. Using the framework, we then provide an example automated malware defense system for email worm detection and response. Finally, we show in simulation that the system outperforms standard security techniques that have been used in practice. The result is a novel belief-space planning approach to auto- mated malware defense designed for robust, accurate, and efficient use in large networks of resource-constrained devices.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Malware detection and response is critical to ensuring information security across a wide range of devices. There have been few attempts, however, to develop security systems that exploit the benefits of different malware detection techniques. We formally introduce an automated malware defense framework and represent it as a belief-space planning problem that optimally reduces the impact on the performance of a system. Using the framework, we then provide an example automated malware defense system for email worm detection and response. Finally, we show in simulation that the system outperforms standard security techniques that have been used in practice. The result is a novel belief-space planning approach to auto- mated malware defense designed for robust, accurate, and efficient use in large networks of resource-constrained devices. |
Justin Svegliato; Shlomo Zilberstein Adaptive Metareasoning for Bounded Rational Agents Conference IJCAI/ECAI Workshop on Architectures and Evaluation for Generality, Autonomy and Progress in AI (AEGAP), Stockholm, Sweden, 2018. @conference{SZ:SZijcaiAEGAP18,
title = {Adaptive Metareasoning for Bounded Rational Agents},
author = {Justin Svegliato and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/SZijcaiAEGAP18.pdf},
year = {2018},
date = {2018-01-01},
booktitle = {IJCAI/ECAI Workshop on Architectures and Evaluation for Generality, Autonomy and Progress in AI (AEGAP)},
address = {Stockholm, Sweden},
abstract = {In computational approaches to bounded rationality, metareasoning enables intelligent agents to optimize their own decision-making process in order to produce effective action in a timely manner. While there have been substantial efforts to develop effective meta-level control for anytime algorithms, existing techniques rely on extensive offline work, imposing several critical assumptions that diminish their effectiveness and limit their practical utility in the real world. In order to eliminate these assumptions, adaptive metareasoning enables intelligent agents to adapt to each individual instance of the problem at hand without the need for significant offline preprocessing. Building on our re- cent work, we first introduce a model-free approach to meta-level control based on reinforcement learn- ing. We then present a meta-level control technique that uses temporal difference learning. Finally, we show empirically that our approach is effective on a common benchmark in meta-level control.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
In computational approaches to bounded rationality, metareasoning enables intelligent agents to optimize their own decision-making process in order to produce effective action in a timely manner. While there have been substantial efforts to develop effective meta-level control for anytime algorithms, existing techniques rely on extensive offline work, imposing several critical assumptions that diminish their effectiveness and limit their practical utility in the real world. In order to eliminate these assumptions, adaptive metareasoning enables intelligent agents to adapt to each individual instance of the problem at hand without the need for significant offline preprocessing. Building on our re- cent work, we first introduce a model-free approach to meta-level control based on reinforcement learn- ing. We then present a meta-level control technique that uses temporal difference learning. Finally, we show empirically that our approach is effective on a common benchmark in meta-level control. |
Kyle Hollins Wray; Julie A. Shaw; Peter Stone; Stefan J. Witwicki; Shlomo Zilberstein (Ed.) Proceedings of the AAAI Fall Symposium on Reasoning and Learning in Real-World Systems for Long-Term Autonomy Proceedings Arlington, VA, 2018. @proceedings{SZ:WSSWZlta18,
title = {Proceedings of the AAAI Fall Symposium on Reasoning and Learning in Real-World Systems for Long-Term Autonomy},
editor = {Kyle Hollins Wray and Julie A. Shaw and Peter Stone and Stefan J. Witwicki and Shlomo Zilberstein},
url = {https://web.cs.umass.edu/publication/details.php?id=2462},
year = {2018},
date = {2018-01-01},
address = {Arlington, VA},
abstract = {Over the past decade, decision-making agents have been increasingly deployed in industrial settings, consumer products, healthcare, education, and entertainment. The development of drone delivery services, virtual assistants, and autonomous vehicles have highlighted numerous challenges surrounding the operation of autonomous systems in unstructured environments. This includes mechanisms to support autonomous operations over extended periods of time, techniques that facilitate the use of human assistance in learning and decision-making, learning to reduce the reliance on humans over time, addressing the practical scalability of existing methods, relaxing unrealistic assumptions, and alleviating safety concerns about deploying these systems.},
keywords = {},
pubstate = {published},
tppubtype = {proceedings}
}
Over the past decade, decision-making agents have been increasingly deployed in industrial settings, consumer products, healthcare, education, and entertainment. The development of drone delivery services, virtual assistants, and autonomous vehicles have highlighted numerous challenges surrounding the operation of autonomous systems in unstructured environments. This includes mechanisms to support autonomous operations over extended periods of time, techniques that facilitate the use of human assistance in learning and decision-making, learning to reduce the reliance on humans over time, addressing the practical scalability of existing methods, relaxing unrealistic assumptions, and alleviating safety concerns about deploying these systems. |
Kyle Hollins Wray; Shlomo Zilberstein Policy Networks for Reasoning in Long-Term Autonomy Conference AAAI Fall Symposium on Reasoning and Learning in Real-World Systems for Long-Term Autonomy (LTA), Arlington, Virginia, 2018. @conference{SZ:WZlta18,
title = {Policy Networks for Reasoning in Long-Term Autonomy},
author = {Kyle Hollins Wray and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/WZlta18.pdf},
year = {2018},
date = {2018-01-01},
booktitle = {AAAI Fall Symposium on Reasoning and Learning in Real-World Systems for Long-Term Autonomy (LTA)},
address = {Arlington, Virginia},
abstract = {Policy networks are graphical models that integrate decision-making models. They allow for multiple Markov decision processes (MDPs) that describe distinct focused aspects of a domain to work in harmony to solve a large-scale problem. This paper presents the formalization of policy networks and their use in modeling reasoning tasks necessary for scalable long-term autonomy. We prove that policy networks generalize a wide array of previous models, such as options and constrained MDPs, which can be equivalently viewed as the integration of multiple models. To illustrate the approach, we apply policy networks to the challenging real world domain of robotic home health care. We demonstrate the benefits of policy networks on a real robot and show how they facilitate scalable integration of multiple decision-making models.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Policy networks are graphical models that integrate decision-making models. They allow for multiple Markov decision processes (MDPs) that describe distinct focused aspects of a domain to work in harmony to solve a large-scale problem. This paper presents the formalization of policy networks and their use in modeling reasoning tasks necessary for scalable long-term autonomy. We prove that policy networks generalize a wide array of previous models, such as options and constrained MDPs, which can be equivalently viewed as the integration of multiple models. To illustrate the approach, we apply policy networks to the challenging real world domain of robotic home health care. We demonstrate the benefits of policy networks on a real robot and show how they facilitate scalable integration of multiple decision-making models. |
2017
|
Xiaojian Wu; Akshat Kumar; Daniel Sheldon; Shlomo Zilberstein Robust Optimization for Tree-Structured Stochastic Network Design Conference Proceedings of the 31st Conference on Artificial Intelligence (AAAI), San Francisco, California, 2017, (Best Paper Award). @conference{SZ:WKSZaaai17,
title = {Robust Optimization for Tree-Structured Stochastic Network Design},
author = {Xiaojian Wu and Akshat Kumar and Daniel Sheldon and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/WKSZaaai17.pdf},
year = {2017},
date = {2017-01-01},
booktitle = {Proceedings of the 31st Conference on Artificial Intelligence (AAAI)},
pages = {4545--4551},
address = {San Francisco, California},
abstract = {Stochastic network design is a general framework for optimizing network connectivity. It has several applications in computational sustainability including spatial conservation planning, pre-disaster network preparation, and river net- work optimization. A common assumption in previous work has been made that network parameters (e.g., probability of species colonization) are precisely known, which is unrealistic in real-world settings. We therefore address the robust river network design problem where the goal is to optimize river connectivity for fish movement by removing barriers. We assume that fish passability probabilities are known only imprecisely, but are within some interval bounds. We then develop a planning approach that computes the policies with either high robust ratio or low regret. Empirically, our approach scales well to large river networks. We also provide insights into the solutions generated by our robust approach, which has significantly higher robust ratio than the baseline solution with mean parameter estimates.},
note = {Best Paper Award},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Stochastic network design is a general framework for optimizing network connectivity. It has several applications in computational sustainability including spatial conservation planning, pre-disaster network preparation, and river net- work optimization. A common assumption in previous work has been made that network parameters (e.g., probability of species colonization) are precisely known, which is unrealistic in real-world settings. We therefore address the robust river network design problem where the goal is to optimize river connectivity for fish movement by removing barriers. We assume that fish passability probabilities are known only imprecisely, but are within some interval bounds. We then develop a planning approach that computes the policies with either high robust ratio or low regret. Empirically, our approach scales well to large river networks. We also provide insights into the solutions generated by our robust approach, which has significantly higher robust ratio than the baseline solution with mean parameter estimates. |
Timothy J Wright; Ravi Agrawal; Siby Samuel; Yuhua Wang; Shlomo Zilberstein; Donald L Fisher Effects of Alert Cue Specificity on Situation Awareness in Transfer of Control in Level 3 Automation Journal Article In: Journal of the Transportation Research Board, vol. 2663, pp. 27–33, 2017. @article{SZ:WASWZFtrb17b,
title = {Effects of Alert Cue Specificity on Situation Awareness in Transfer of Control in Level 3 Automation},
author = {Timothy J Wright and Ravi Agrawal and Siby Samuel and Yuhua Wang and Shlomo Zilberstein and Donald L Fisher},
url = {http://dx.doi.org/10.3141/2663-04},
year = {2017},
date = {2017-01-01},
journal = {Journal of the Transportation Research Board},
volume = {2663},
pages = {27--33},
abstract = {Drivers in a Level 3 automation environment typically need at least 8 s following a manual takeover request to achieve appropriate levels of situation awareness. Studies that have derived this time estimate use general audio alerts that suggest a transfer of control from the automation to the driver might be required. The current experiment examined if improvements in younger drivers' situation awareness might be observed in as little as 4 s before a latent hazard might materialize and a transfer of control occurs if more specific audio alerts are used. Younger drivers either drove manually with no cue or in one of four automation conditions: (a) a general cue condition, (b) a condition that described the risky features of the roadway and the location of those features, (c) a condition that contained information about the actual identity of the threat and the required behavior, and (d) a combination cue condition (both environment and threat cue). Eye movements were recorded as drivers completed six scenarios in a simulated automated driving experiment. The results showed that audio cues that contained information about risky roadway features increased the detection of latent hazards by almost 40% compared with when a general cue or a threat cue was used. Performance with the combined cue was no better than performance with the environment cue. The environment cue gives drivers the critical seconds needed to mitigate a potential crash. Results are informative about which types of alerts to use to inform drivers of upcoming hazards.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Drivers in a Level 3 automation environment typically need at least 8 s following a manual takeover request to achieve appropriate levels of situation awareness. Studies that have derived this time estimate use general audio alerts that suggest a transfer of control from the automation to the driver might be required. The current experiment examined if improvements in younger drivers' situation awareness might be observed in as little as 4 s before a latent hazard might materialize and a transfer of control occurs if more specific audio alerts are used. Younger drivers either drove manually with no cue or in one of four automation conditions: (a) a general cue condition, (b) a condition that described the risky features of the roadway and the location of those features, (c) a condition that contained information about the actual identity of the threat and the required behavior, and (d) a combination cue condition (both environment and threat cue). Eye movements were recorded as drivers completed six scenarios in a simulated automated driving experiment. The results showed that audio cues that contained information about risky roadway features increased the detection of latent hazards by almost 40% compared with when a general cue or a threat cue was used. Performance with the combined cue was no better than performance with the environment cue. The environment cue gives drivers the critical seconds needed to mitigate a potential crash. Results are informative about which types of alerts to use to inform drivers of upcoming hazards. |
Ravi Agrawal; Timothy J Wright; Siby Samuel; Shlomo Zilberstein; Donald L Fisher Effects of a Change in Environment on the Minimum Time to Situation Awareness in Transfer of Control Scenarios Journal Article In: Journal of the Transportation Research Board, vol. 2663, pp. 126–133, 2017. @article{SZ:WASWZFtrb17a,
title = {Effects of a Change in Environment on the Minimum Time to Situation Awareness in Transfer of Control Scenarios},
author = {Ravi Agrawal and Timothy J Wright and Siby Samuel and Shlomo Zilberstein and Donald L Fisher},
url = {http://dx.doi.org/10.3141/2663-16},
year = {2017},
date = {2017-01-01},
journal = {Journal of the Transportation Research Board},
volume = {2663},
pages = {126--133},
abstract = {From previous experiments, it is known that control must be transferred to the driver in a Level 3 vehicle at least 8 s before the driver passes a latent hazard for the driver to be as aware of the latent hazard as the driver is when glancing continuously on the forward roadway. In these experiments, the driving environment remained consistent throughout the time the automated driving suite (ADS) was engaged, and immediately after control was transferred to the driver. Considering that drivers expect different categories of hazards in different driving environments, a transition to a different environment while the ADS is engaged may impair a driver's ability to both achieve situation awareness and successfully mitigate hazards. The current experiment examined if 8 s was enough time for drivers to achieve situation awareness and appropriately mitigate hazards when the roadway environment changes while the driver is engaged in a secondary activity that takes his or her eyes away from the forward roadway. Drivers' eye movements and vehicle metrics were recorded as they completed one of three conditions in a driving simulator: an automation condition where the driving environment remained consistent throughout; an automation condition that contained some transitions to a new environment while the driver engaged the ADS; and a manual driving condition that also contained the same transitions as the latter automation condition. Results suggest that even 8 s is not enough time for drivers to achieve situation awareness and mitigate hazards when the hazards are unexpected.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
From previous experiments, it is known that control must be transferred to the driver in a Level 3 vehicle at least 8 s before the driver passes a latent hazard for the driver to be as aware of the latent hazard as the driver is when glancing continuously on the forward roadway. In these experiments, the driving environment remained consistent throughout the time the automated driving suite (ADS) was engaged, and immediately after control was transferred to the driver. Considering that drivers expect different categories of hazards in different driving environments, a transition to a different environment while the ADS is engaged may impair a driver's ability to both achieve situation awareness and successfully mitigate hazards. The current experiment examined if 8 s was enough time for drivers to achieve situation awareness and appropriately mitigate hazards when the roadway environment changes while the driver is engaged in a secondary activity that takes his or her eyes away from the forward roadway. Drivers' eye movements and vehicle metrics were recorded as they completed one of three conditions in a driving simulator: an automation condition where the driving environment remained consistent throughout; an automation condition that contained some transitions to a new environment while the driver engaged the ADS; and a manual driving condition that also contained the same transitions as the latter automation condition. Results suggest that even 8 s is not enough time for drivers to achieve situation awareness and mitigate hazards when the hazards are unexpected. |
Richard G Freedman; Shlomo Zilberstein Integration of Planning with Recognition for Responsive Interaction Using Classical Planners Conference Proceedings of the 31st Conference on Artificial Intelligence (AAAI), San Francisco, California, 2017. @conference{SZ:FZaaai17,
title = {Integration of Planning with Recognition for Responsive Interaction Using Classical Planners},
author = {Richard G Freedman and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/FZaaai17.pdf},
year = {2017},
date = {2017-01-01},
booktitle = {Proceedings of the 31st Conference on Artificial Intelligence (AAAI)},
pages = {4581--4588},
address = {San Francisco, California},
abstract = {Interaction between multiple agents requires some form of coordination and a level of mutual awareness. When computers and robots interact with people, they need to recognize human plans and react appropriately. Plan and goal recognition techniques have focused on identifying an agent's task given a sufficiently long action sequence. However, by the time the plan and/or goal are recognized, it may be too late for computing an interactive response. We propose an integration of planning with probabilistic recognition where each method uses intermediate results from the other as a guid- ing heuristic for recognition of the plan/goal in-progress as well as the interactive response. We show that, like the used recognition method, these interaction problems can be com- piled into classical planning problems and solved using off- the-shelf methods. In addition to the methodology, this paper introduces problem categories for different forms of interaction, an evaluation metric for the benefits from the interaction, and extensions to the recognition algorithm that make its intermediate results more practical while the plan is in progress.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Interaction between multiple agents requires some form of coordination and a level of mutual awareness. When computers and robots interact with people, they need to recognize human plans and react appropriately. Plan and goal recognition techniques have focused on identifying an agent's task given a sufficiently long action sequence. However, by the time the plan and/or goal are recognized, it may be too late for computing an interactive response. We propose an integration of planning with probabilistic recognition where each method uses intermediate results from the other as a guid- ing heuristic for recognition of the plan/goal in-progress as well as the interactive response. We show that, like the used recognition method, these interaction problems can be com- piled into classical planning problems and solved using off- the-shelf methods. In addition to the methodology, this paper introduces problem categories for different forms of interaction, an evaluation metric for the benefits from the interaction, and extensions to the recognition algorithm that make its intermediate results more practical while the plan is in progress. |
Luis Enrique Pineda; Kyle Hollins Wray; Shlomo Zilberstein Fast SSP Solvers Using Short-Sighted Labeling Conference Proceedings of the 31st Conference on Artificial Intelligence (AAAI), San Francisco, California, 2017. @conference{SZ:PWZaaai17,
title = {Fast SSP Solvers Using Short-Sighted Labeling},
author = {Luis Enrique Pineda and Kyle Hollins Wray and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/PWZaaai17.pdf},
year = {2017},
date = {2017-01-01},
booktitle = {Proceedings of the 31st Conference on Artificial Intelligence (AAAI)},
pages = {3629--3635},
address = {San Francisco, California},
abstract = {State-of-the-art methods for solving SSPs often work by limiting planning to restricted regions of the state space. The resulting problems can then be solved quickly, and the pro- cess is repeated during execution when states outside the restricted region are encountered. Typically, these approaches focus on states that are within some distance measure of the start state (e.g., number of actions or probability of being reached). However, these short-sighted approaches make it difficult to propagate information from states that are closer to a goal than to the start state, thus missing opportunities to improve planning. We present an alternative approach in which short-sightedness is used only to determine whether a state should be labeled as solved or not, but otherwise the set of states that can be accounted for during planning is unrestricted. Based on this idea, we propose the FLARES algorithm and show that it performs consistently well on a wide range of benchmark problems.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
State-of-the-art methods for solving SSPs often work by limiting planning to restricted regions of the state space. The resulting problems can then be solved quickly, and the pro- cess is repeated during execution when states outside the restricted region are encountered. Typically, these approaches focus on states that are within some distance measure of the start state (e.g., number of actions or probability of being reached). However, these short-sighted approaches make it difficult to propagate information from states that are closer to a goal than to the start state, thus missing opportunities to improve planning. We present an alternative approach in which short-sightedness is used only to determine whether a state should be labeled as solved or not, but otherwise the set of states that can be accounted for during planning is unrestricted. Based on this idea, we propose the FLARES algorithm and show that it performs consistently well on a wide range of benchmark problems. |
Kyle Hollins Wray; Shlomo Zilberstein Approximating Reachable Belief Points in POMDPs with Applications to Robotic Navigation and Localization Conference ICAPS Workshop on Planning and Robotics (PlanRob), Pittsburgh, Pennsylvania, 2017. @conference{SZ:WZplanrob17,
title = {Approximating Reachable Belief Points in POMDPs with Applications to Robotic Navigation and Localization},
author = {Kyle Hollins Wray and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/WZplanrob17.pdf},
year = {2017},
date = {2017-01-01},
booktitle = {ICAPS Workshop on Planning and Robotics (PlanRob)},
address = {Pittsburgh, Pennsylvania},
abstract = {Stochastic network design is a general framework for optimizing network connectivity. It has several applications in computational sustainability including spatial conservation planning, pre-disaster network preparation, and river net- work optimization. A common assumption in previous work has been made that network parameters (e.g., probability of species colonization) are precisely known, which is unrealistic in real-world settings. We therefore address the robust river network design problem where the goal is to optimize river connectivity for fish movement by removing barriers. We assume that fish passability probabilities are known only imprecisely, but are within some interval bounds. We then develop a planning approach that computes the policies with either high robust ratio or low regret. Empirically, our approach scales well to large river networks. We also provide insights into the solutions generated by our robust approach, which has significantly higher robust ratio than the baseline solution with mean parameter estimates.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Stochastic network design is a general framework for optimizing network connectivity. It has several applications in computational sustainability including spatial conservation planning, pre-disaster network preparation, and river net- work optimization. A common assumption in previous work has been made that network parameters (e.g., probability of species colonization) are precisely known, which is unrealistic in real-world settings. We therefore address the robust river network design problem where the goal is to optimize river connectivity for fish movement by removing barriers. We assume that fish passability probabilities are known only imprecisely, but are within some interval bounds. We then develop a planning approach that computes the policies with either high robust ratio or low regret. Empirically, our approach scales well to large river networks. We also provide insights into the solutions generated by our robust approach, which has significantly higher robust ratio than the baseline solution with mean parameter estimates. |
Sarah Keren; Luis Enrique Pineda; Avigdor Gal; Erez Karpas; Shlomo Zilberstein Equi-Reward Utility Maximizing Design in Stochastic Environments Conference ICAPS Workshop on Heuristics and Search for Domain-independent Planning (HSDIP), Pittsburgh, Pennsylvania, 2017. @conference{SZ:KPGKZhsdip17,
title = {Equi-Reward Utility Maximizing Design in Stochastic Environments},
author = {Sarah Keren and Luis Enrique Pineda and Avigdor Gal and Erez Karpas and Shlomo Zilberstein},
year = {2017},
date = {2017-01-01},
booktitle = {ICAPS Workshop on Heuristics and Search for Domain-independent Planning (HSDIP)},
address = {Pittsburgh, Pennsylvania},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
|
Sandhya Saisubramanian; Shlomo Zilberstein; Prashant Shenoy Optimizing Electric Vehicle Charging Through Determinization Conference ICAPS Workshop on Scheduling and Planning Applications (SPARK), Pittsburgh, Pennsylvania, 2017. @conference{SZ:SZSspark17,
title = {Optimizing Electric Vehicle Charging Through Determinization},
author = {Sandhya Saisubramanian and Shlomo Zilberstein and Prashant Shenoy},
url = {http://rbr.cs.umass.edu/shlomo/papers/SZSspark17.pdf},
year = {2017},
date = {2017-01-01},
booktitle = {ICAPS Workshop on Scheduling and Planning Applications (SPARK)},
address = {Pittsburgh, Pennsylvania},
abstract = {We propose a determinization based approach to optimize the charging policies of an electric vehicle (EV) operating in a vehicle-to-grid (V2G) setting. By planning when to charge or discharge electricity from the vehicle, the long-term cost of operating the EV can be minimized, while being consistent with the owner's preferences. For an EV operating under price uncertainty caused by the dynamic pricing of electricity, this problem needs to be solved on-the-fly. Therefore, we model this problem as a Stochastic Shortest Path (SSP) problem and employ a determinization technique to solve it. Since it is hard to predict a priori the performance of a determinization method on a given problem, we introduce the notion of Lossless Determinization (LLD) that produces optimal action selection via determinization and present an approach that achieves lossless determinization by adjusting the cost of actions to account for the ignored outcomes. We also present Approximate Lossless Determinization (ALLD)--an effective method for approximating the cost of actions based on state features. We evaluate the performance of ALLD and demonstrate its effectiveness on a range of settings for the electric vehicle charging problem.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
We propose a determinization based approach to optimize the charging policies of an electric vehicle (EV) operating in a vehicle-to-grid (V2G) setting. By planning when to charge or discharge electricity from the vehicle, the long-term cost of operating the EV can be minimized, while being consistent with the owner's preferences. For an EV operating under price uncertainty caused by the dynamic pricing of electricity, this problem needs to be solved on-the-fly. Therefore, we model this problem as a Stochastic Shortest Path (SSP) problem and employ a determinization technique to solve it. Since it is hard to predict a priori the performance of a determinization method on a given problem, we introduce the notion of Lossless Determinization (LLD) that produces optimal action selection via determinization and present an approach that achieves lossless determinization by adjusting the cost of actions to account for the ignored outcomes. We also present Approximate Lossless Determinization (ALLD)--an effective method for approximating the cost of actions based on state features. We evaluate the performance of ALLD and demonstrate its effectiveness on a range of settings for the electric vehicle charging problem. |
Richard G Freedman; Shlomo Zilberstein A PDDL Representation for Contradance Composition Conference ICAPS Workshop on Knowledge Engineering for Planning and Scheduling (KEPS), Pittsburgh, Pennsylvania, 2017. @conference{SZ:FZkeps17,
title = {A PDDL Representation for Contradance Composition},
author = {Richard G Freedman and Shlomo Zilberstein},
year = {2017},
date = {2017-01-01},
booktitle = {ICAPS Workshop on Knowledge Engineering for Planning and Scheduling (KEPS)},
address = {Pittsburgh, Pennsylvania},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
|
Kyle Hollins Wray; Stefan J Witwicki; Shlomo Zilberstein Online Decision Making for Scalable Autonomous Systems Conference Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI), 2017. @conference{SZ:WWZijcai17,
title = {Online Decision Making for Scalable Autonomous Systems},
author = {Kyle Hollins Wray and Stefan J Witwicki and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/WWZijcai17.pdf},
doi = {10.24963/ijcai.2017/664},
year = {2017},
date = {2017-01-01},
booktitle = {Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI)},
pages = {4768--4774},
abstract = {We present a general formal model called MODIA that can tackle a central challenge for autonomous vehicles (AVs), namely the ability to interact with an unspecified, large number of world entities. In MODIA, a collection of possible decision- problems (DPs), known a priori, are instantiated online and executed as decision-components (DCs), unknown a priori. To combine the individual action recommendations of the DCs into a single action, we propose the lexicographic executor action function (LEAF) mechanism. We analyze the complexity of MODIA and establish LEAF's relation to regret minimization. Finally, we implement MODIA and LEAF using collections of partially observable Markov decision pro- cess (POMDP) DPs, and use them for complex AV intersection decision-making. We evaluate the approach in six scenarios within a realistic vehicle simulator and present its use on an AV prototype.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
We present a general formal model called MODIA that can tackle a central challenge for autonomous vehicles (AVs), namely the ability to interact with an unspecified, large number of world entities. In MODIA, a collection of possible decision- problems (DPs), known a priori, are instantiated online and executed as decision-components (DCs), unknown a priori. To combine the individual action recommendations of the DCs into a single action, we propose the lexicographic executor action function (LEAF) mechanism. We analyze the complexity of MODIA and establish LEAF's relation to regret minimization. Finally, we implement MODIA and LEAF using collections of partially observable Markov decision pro- cess (POMDP) DPs, and use them for complex AV intersection decision-making. We evaluate the approach in six scenarios within a realistic vehicle simulator and present its use on an AV prototype. |
Feng Wu; Shlomo Zilberstein; Xiaoping Chen Multi-Agent Planning with Baseline Regret Minimization Conference Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI), 2017. @conference{SZ:WZCijcai17,
title = {Multi-Agent Planning with Baseline Regret Minimization},
author = {Feng Wu and Shlomo Zilberstein and Xiaoping Chen},
url = {http://rbr.cs.umass.edu/shlomo/papers/WZCijcai17.pdf},
doi = {10.24963/ijcai.2017/63},
year = {2017},
date = {2017-01-01},
booktitle = {Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI)},
pages = {444--450},
abstract = {We propose a novel baseline regret minimization algorithm for multi-agent planning problems modeled as finite-horizon decentralized POMDPs. It guarantees to produce a policy that is provably at least as good as a given baseline policy. We also propose an iterative belief generation algorithm to efficiently minimize the baseline regret, which only requires necessary iterations so as to converge to the policy with minimum baseline regret. Experimental results on common benchmark problems confirm the benefits of the algorithm compared with the state-of-the-art approaches.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
We propose a novel baseline regret minimization algorithm for multi-agent planning problems modeled as finite-horizon decentralized POMDPs. It guarantees to produce a policy that is provably at least as good as a given baseline policy. We also propose an iterative belief generation algorithm to efficiently minimize the baseline regret, which only requires necessary iterations so as to converge to the policy with minimum baseline regret. Experimental results on common benchmark problems confirm the benefits of the algorithm compared with the state-of-the-art approaches. |
Haochong Zhang; Rongyun Cao; Shlomo Zilberstein; Feng Wu; Xiaoping Chen Toward Effective Soft Robot Control via Reinforcement Learning Conference Proceedings of the 10th International Conference on Intelligent Robotics and Applications, 2017. @conference{SZ:ZCZWCicira17,
title = {Toward Effective Soft Robot Control via Reinforcement Learning},
author = {Haochong Zhang and Rongyun Cao and Shlomo Zilberstein and Feng Wu and Xiaoping Chen},
url = {https://doi.org/10.1007/978-3-319-65289-4_17},
doi = {10.1007/978-3-319-65289-4_17},
year = {2017},
date = {2017-01-01},
booktitle = {Proceedings of the 10th International Conference on Intelligent Robotics and Applications},
pages = {173--184},
abstract = {A soft robot is a kind of robot that is constructed with soft, deformable and elastic materials. Control of soft robots presents complex modeling and planning challenges. We introduce a new approach to accomplish that, making two key contributions: designing an abstract representation of the state of soft robots, and developing a reinforcement learning method to derive effective control policies. The reinforcement learning process can be trained quickly by ignoring the specific materials and structural properties of the soft robot. We apply the approach to the Honeycomb PneuNets Soft Robot and demonstrate the effectiveness of the training method and its ability to produce good control policies under different conditions.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
A soft robot is a kind of robot that is constructed with soft, deformable and elastic materials. Control of soft robots presents complex modeling and planning challenges. We introduce a new approach to accomplish that, making two key contributions: designing an abstract representation of the state of soft robots, and developing a reinforcement learning method to derive effective control policies. The reinforcement learning process can be trained quickly by ignoring the specific materials and structural properties of the soft robot. We apply the approach to the Honeycomb PneuNets Soft Robot and demonstrate the effectiveness of the training method and its ability to produce good control policies under different conditions. |
Kyle Hollins Wray; Shlomo Zilberstein Approximating reachable belief points in POMDPs Conference Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 2017. @conference{SZ:WZiros17,
title = {Approximating reachable belief points in POMDPs},
author = {Kyle Hollins Wray and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/WZiros17.pdf},
doi = {10.1109/IROS.2017.8202146},
year = {2017},
date = {2017-01-01},
booktitle = {Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
pages = {117--122},
address = {Vancouver, BC, Canada},
abstract = {We propose an algorithm called ?-approximation that compresses the non-zero values of beliefs for partially observable Markov decision processes (POMDPs) in order to improve performance and reduce memory usage. Specifically, we approximate individual belief vectors with a fixed bound on the number of non-zero values they may contain. We prove the correctness and a strong error bound when the ?-approximation is used with the point-based value iteration (PBVI) family algorithms. An analysis compares the algorithm on six larger domains, varying the number of non-zero values for the ?-approximation. Results clearly demonstrate that when the algorithm used with PBVI (?-PBVI), we can achieve over an order of magnitude improvement. We ground our claims with a full robotic implementation for simultaneous navigation and localization using POMDPs with ?-PBVI.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
We propose an algorithm called ?-approximation that compresses the non-zero values of beliefs for partially observable Markov decision processes (POMDPs) in order to improve performance and reduce memory usage. Specifically, we approximate individual belief vectors with a fixed bound on the number of non-zero values they may contain. We prove the correctness and a strong error bound when the ?-approximation is used with the point-based value iteration (PBVI) family algorithms. An analysis compares the algorithm on six larger domains, varying the number of non-zero values for the ?-approximation. Results clearly demonstrate that when the algorithm used with PBVI (?-PBVI), we can achieve over an order of magnitude improvement. We ground our claims with a full robotic implementation for simultaneous navigation and localization using POMDPs with ?-PBVI. |
Luis Pineda; Shlomo Zilberstein Generalizing the Role of Determinization in Probabilistic Planning Technical Report College of Information and Computer Sciences, University of Massachussetts Amherst no. 2017-06, 2017. @techreport{SZ:PZtr17,
title = {Generalizing the Role of Determinization in Probabilistic Planning},
author = {Luis Pineda and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/PZtr17.pdf},
year = {2017},
date = {2017-01-01},
number = {2017-06},
institution = {College of Information and Computer Sciences, University of Massachussetts Amherst},
abstract = {The stochastic shortest path problem (SSP) is a highly expressive model for probabilistic planning. The computational hardness of SSPs has sparked interest in determinization-based planners that can quickly solve large problems. However, existing methods employ a simplistic approach to determinization. In particular, they ignore the possibility of tailoring the determinization to the specific characteristics of the target domain. In this work we examine this question, by showing that learning a good determinization for a planning do- main can be done efficiently and can improve performance. Moreover, we show how to directly in- corporate probabilistic reasoning into the planning problem when a good determinization is not sufficient by itself. Based on these insights, we introduce a planner, FF-LAO*, that outperforms state- of-the-art probabilistic planners on several well- known competition benchmarks.},
keywords = {},
pubstate = {published},
tppubtype = {techreport}
}
The stochastic shortest path problem (SSP) is a highly expressive model for probabilistic planning. The computational hardness of SSPs has sparked interest in determinization-based planners that can quickly solve large problems. However, existing methods employ a simplistic approach to determinization. In particular, they ignore the possibility of tailoring the determinization to the specific characteristics of the target domain. In this work we examine this question, by showing that learning a good determinization for a planning do- main can be done efficiently and can improve performance. Moreover, we show how to directly in- corporate probabilistic reasoning into the planning problem when a good determinization is not sufficient by itself. Based on these insights, we introduce a planner, FF-LAO*, that outperforms state- of-the-art probabilistic planners on several well- known competition benchmarks. |
Sarah Keren; Luis Enrique Pineda; Avigdor Gal; Erez Karpas; Shlomo Zilberstein Equi-Reward Utility Maximizing Design in Stochastic Environments Conference Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI), Melbourne, Australia, 2017. @conference{SZ:KPGKZijcai17,
title = {Equi-Reward Utility Maximizing Design in Stochastic Environments},
author = {Sarah Keren and Luis Enrique Pineda and Avigdor Gal and Erez Karpas and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/KPGKZijcai17.pdf},
doi = {10.24963/ijcai.2017/608},
year = {2017},
date = {2017-01-01},
booktitle = {Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI)},
pages = {4353--4360},
address = {Melbourne, Australia},
abstract = {We present the Equi-Reward Utility Maximizing Design (ER-UMD) problem for redesigning stochastic environments to maximize agent performance. ER-UMD fits well contemporary applications that require offline design of environments where robots and humans act and cooperate. To find an optimal modification sequence we present two novel solution techniques: a compilation that embeds design into a planning problem, allowing use of off-the-shelf solvers to find a solution, and a heuristic search in the modifications space, for which we present an admissible heuristic. Evaluation shows the feasibility of the approach using standard benchmarks from the probabilistic planning competition and a benchmark we created for a vacuum cleaning robot setting.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
We present the Equi-Reward Utility Maximizing Design (ER-UMD) problem for redesigning stochastic environments to maximize agent performance. ER-UMD fits well contemporary applications that require offline design of environments where robots and humans act and cooperate. To find an optimal modification sequence we present two novel solution techniques: a compilation that embeds design into a planning problem, allowing use of off-the-shelf solvers to find a solution, and a heuristic search in the modifications space, for which we present an admissible heuristic. Evaluation shows the feasibility of the approach using standard benchmarks from the probabilistic planning competition and a benchmark we created for a vacuum cleaning robot setting. |
2016
|
Siby Samuel; Avinoam Borowsky; Shlomo Zilberstein; Donald L Fisher Minimum Time to Situation Awareness in Scenarios Involving Transfer of Control from an Automated Driving Suite Journal Article In: Journal of the Transportation Research Board, vol. 2602, pp. 115–120, 2016. @article{SZ:SBZFtrb16,
title = {Minimum Time to Situation Awareness in Scenarios Involving Transfer of Control from an Automated Driving Suite},
author = {Siby Samuel and Avinoam Borowsky and Shlomo Zilberstein and Donald L Fisher},
url = {https://journals.sagepub.com/doi/abs/10.3141/2602-14},
doi = {10.3141/2602-14},
year = {2016},
date = {2016-01-01},
journal = {Journal of the Transportation Research Board},
volume = {2602},
pages = {115--120},
abstract = {This research assessed the impact of vehicle automation on a driver's ability to anticipate latent threats and to detect materialized hazards on the forward roadway. In particular, the minimum alert time before transfer of control was determined. This was the minimum time required after an autonomous driving suite (ADS) had been in full control of a vehicle for the driver to reacquire the same level of situation awareness that he or she had when in full control of the vehicle. This simulator study included five treatment conditions during which drivers either were always in complete control of their own vehicle (control) or were required to resume control at 4 s, 6 s, 8 s, or 12 s before the appearance of a latent hazard (transfer). While the vehicle was in autonomous mode, the drivers performed an in-vehicle task for more than a minute and were told not to glance at the forward roadway. Analysis of eye movements showed that drivers in the control condition detected nearly 40% more hazards compared with drivers in the shortest transfer condition. The results indicated how long before control was transferred from the ADS back to a driver that the driver should be told that a transfer would occur, if the driver were to have full situation awareness. Unlike previous studies, this study both ensured that the driver was not watching for hazards while the ADS was in control and used a measure of situation awareness (hazard anticipation) that was closely linked to the actual understanding a driver had of the threats present in a given scenario.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
This research assessed the impact of vehicle automation on a driver's ability to anticipate latent threats and to detect materialized hazards on the forward roadway. In particular, the minimum alert time before transfer of control was determined. This was the minimum time required after an autonomous driving suite (ADS) had been in full control of a vehicle for the driver to reacquire the same level of situation awareness that he or she had when in full control of the vehicle. This simulator study included five treatment conditions during which drivers either were always in complete control of their own vehicle (control) or were required to resume control at 4 s, 6 s, 8 s, or 12 s before the appearance of a latent hazard (transfer). While the vehicle was in autonomous mode, the drivers performed an in-vehicle task for more than a minute and were told not to glance at the forward roadway. Analysis of eye movements showed that drivers in the control condition detected nearly 40% more hazards compared with drivers in the shortest transfer condition. The results indicated how long before control was transferred from the ADS back to a driver that the driver should be told that a transfer would occur, if the driver were to have full situation awareness. Unlike previous studies, this study both ensured that the driver was not watching for hazards while the ADS was in control and used a measure of situation awareness (hazard anticipation) that was closely linked to the actual understanding a driver had of the threats present in a given scenario. |
Kyle Hollins Wray; Shlomo Zilberstein A POMDP Formulation of Proactive Learning Conference Proceedings of the 30th Conference on Artificial Intelligence (AAAI), Phoenix, Arizona, 2016. @conference{SZ:WZaaai16,
title = {A POMDP Formulation of Proactive Learning},
author = {Kyle Hollins Wray and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/WZaaai16.pdf},
year = {2016},
date = {2016-01-01},
booktitle = {Proceedings of the 30th Conference on Artificial Intelligence (AAAI)},
pages = {3202--3208},
address = {Phoenix, Arizona},
abstract = {We cast the Proactive Learning (PAL) problem--Active Learning (AL) with multiple reluctant, fallible, cost-varying oracles--as a Partially Observable Markov Decision Process (POMDP). The agent selects an oracle at each time step to label a data point while it maintains a belief over the true underlying correctness of its current dataset's labels. The goal is to minimize labeling costs while considering the value of obtaining correct labels, thus maximizing final resultant classifier accuracy. We prove three properties that show our particular formulation leads to a structured and bounded-size set of belief points, enabling strong performance of point-based methods to solve the POMDP. Our method is compared with the original three algorithms proposed by Donmez and Carbonell and a simple baseline. We demonstrate that our approach matches or improves upon the original approach within five different oracle scenarios, each on two datasets. Finally, our algorithm provides a general, well-defined mathematical foundation to build upon.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
We cast the Proactive Learning (PAL) problem--Active Learning (AL) with multiple reluctant, fallible, cost-varying oracles--as a Partially Observable Markov Decision Process (POMDP). The agent selects an oracle at each time step to label a data point while it maintains a belief over the true underlying correctness of its current dataset's labels. The goal is to minimize labeling costs while considering the value of obtaining correct labels, thus maximizing final resultant classifier accuracy. We prove three properties that show our particular formulation leads to a structured and bounded-size set of belief points, enabling strong performance of point-based methods to solve the POMDP. Our method is compared with the original three algorithms proposed by Donmez and Carbonell and a simple baseline. We demonstrate that our approach matches or improves upon the original approach within five different oracle scenarios, each on two datasets. Finally, our algorithm provides a general, well-defined mathematical foundation to build upon. |
Xiaojian Wu; Daniel Sheldon; Shlomo Zilberstein Optimizing Resilience in Large Scale Networks Conference Proceedings of the 30th Conference on Artificial Intelligence (AAAI), Phoenix, Arizona, 2016. @conference{SZ:WSZaaai16,
title = {Optimizing Resilience in Large Scale Networks},
author = {Xiaojian Wu and Daniel Sheldon and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/WSZaaai16.pdf},
year = {2016},
date = {2016-01-01},
booktitle = {Proceedings of the 30th Conference on Artificial Intelligence (AAAI)},
pages = {3922--3928},
address = {Phoenix, Arizona},
abstract = {We propose a decision making framework to optimize the resilience of road networks to natural disasters such as floods. Our model generalizes an existing one for this problem by allowing roads with a broad class of stochastic delay models. We then present a fast algorithm based on the sample average approximation (SAA) method and network design techniques to solve this problem approximately. On a small existing benchmark, our algorithm produces near-optimal solutions and the SAA method converges quickly with a small number of samples.We then apply our algorithm to a large real-world problem to optimize the resilience of a road network to failures of stream crossing structures to minimize travel times of emergency medical service vehicles. On medium-sized networks, our algorithm obtains solutions of comparable quality to a greedy baseline method but is 30--60 times faster. Our algorithm is the only existing algorithm that can scale to the full network, which has many thousands of edges.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
We propose a decision making framework to optimize the resilience of road networks to natural disasters such as floods. Our model generalizes an existing one for this problem by allowing roads with a broad class of stochastic delay models. We then present a fast algorithm based on the sample average approximation (SAA) method and network design techniques to solve this problem approximately. On a small existing benchmark, our algorithm produces near-optimal solutions and the SAA method converges quickly with a small number of samples.We then apply our algorithm to a large real-world problem to optimize the resilience of a road network to failures of stream crossing structures to minimize travel times of emergency medical service vehicles. On medium-sized networks, our algorithm obtains solutions of comparable quality to a greedy baseline method but is 30--60 times faster. Our algorithm is the only existing algorithm that can scale to the full network, which has many thousands of edges. |
Kyle Hollins Wray; Luis Enrique Pineda; Shlomo Zilberstein Hierarchical Approach to Transfer of Control in Semi-Autonomous Systems (Extended Abstract) Conference Proceedings of the 15th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Singapore, 2016. @conference{SZ:WPZaamas16,
title = {Hierarchical Approach to Transfer of Control in Semi-Autonomous Systems (Extended Abstract)},
author = {Kyle Hollins Wray and Luis Enrique Pineda and Shlomo Zilberstein},
url = {http://dl.acm.org/citation.cfm?id=2937122},
year = {2016},
date = {2016-01-01},
booktitle = {Proceedings of the 15th International Conference on Autonomous Agents and Multiagent Systems (AAMAS)},
pages = {1285--1286},
address = {Singapore},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
|
Akshat Kumar; Hala Mostafa; Shlomo Zilberstein Dual Formulations for Optimizing Dec-POMDP Controllers Conference Proceedings of the 26th International Conference on Automated Planning and Scheduling (ICAPS), London, UK, 2016. @conference{SZ:KMZicaps16,
title = {Dual Formulations for Optimizing Dec-POMDP Controllers},
author = {Akshat Kumar and Hala Mostafa and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/KMZicaps16.pdf},
year = {2016},
date = {2016-01-01},
booktitle = {Proceedings of the 26th International Conference on Automated Planning and Scheduling (ICAPS)},
pages = {202--210},
address = {London, UK},
abstract = {Decentralized POMDP is an expressive model for multiagent planning. Finite-state controllers (FSCs)--often used to represent policies for infinite-horizon problems---offer a compact, simple-to-execute policy representation. We exploit novel connections between optimizing decentralized FSCs and the dual linear program for MDPs. Consequently, we describe a dual mixed integer linear program (MIP) for optimizing deterministic FSCs. We exploit the Dec-POMDP structure to devise a compact MIP and formulate constraints that result in policies executable in partially-observable decentralized settings. We show analytically that the dual formulation can also be exploited within the expectation maximization (EM) framework to optimize stochastic FSCs. The resulting EM algorithm can be implemented by solving a sequence of linear programs, without requiring expensive message passing over the Dec-POMDP DBN. We also present an efficient technique for policy improvement based on a weighted entropy measure. Compared with state-of-the-art FSC methods, our approach offers over an order-of-magnitude speedup, while producing similar or better solutions.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Decentralized POMDP is an expressive model for multiagent planning. Finite-state controllers (FSCs)--often used to represent policies for infinite-horizon problems---offer a compact, simple-to-execute policy representation. We exploit novel connections between optimizing decentralized FSCs and the dual linear program for MDPs. Consequently, we describe a dual mixed integer linear program (MIP) for optimizing deterministic FSCs. We exploit the Dec-POMDP structure to devise a compact MIP and formulate constraints that result in policies executable in partially-observable decentralized settings. We show analytically that the dual formulation can also be exploited within the expectation maximization (EM) framework to optimize stochastic FSCs. The resulting EM algorithm can be implemented by solving a sequence of linear programs, without requiring expensive message passing over the Dec-POMDP DBN. We also present an efficient technique for policy improvement based on a weighted entropy measure. Compared with state-of-the-art FSC methods, our approach offers over an order-of-magnitude speedup, while producing similar or better solutions. |
Kyle Hollins Wray; Luis Enrique Pineda; Shlomo Zilberstein Hierarchical Approach to Transfer of Control in Semi-Autonomous Systems Conference Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI), New York, NY, 2016. @conference{SZ:WPZijcai16,
title = {Hierarchical Approach to Transfer of Control in Semi-Autonomous Systems},
author = {Kyle Hollins Wray and Luis Enrique Pineda and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/WPZijcai16.pdf},
year = {2016},
date = {2016-01-01},
booktitle = {Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI)},
pages = {517--523},
address = {New York, NY},
abstract = {Semi-Autonomous Systems (SAS) encapsulate a stochastic decision process explicitly controlled by both an agent and a human, in order to leverage the distinct capabilities of each actor. Planning in SAS must address the challenge of transferring control quickly, safely, and smoothly back-and-forth between the agent and the human. We formally define SAS and the requirements to guarantee that the controlling entities are always able to act competently. We then consider applying the model to Semi-Autonomous VEhicles (SAVE), using a hierarchical approach in which micro-level transfer-of-control actions are governed by a high-fidelity POMDP model. Macro-level path planning in our hierarchical approach is performed by solving a Stochastic Shortest Path (SSP) problem. We analyze the integrated model and show that it provides the required guarantees. Finally, we test the SAVE model using real-world road data from Open Street Map (OSM) within 10 cities, showing the benefits of the collaboration between the agent and human.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Semi-Autonomous Systems (SAS) encapsulate a stochastic decision process explicitly controlled by both an agent and a human, in order to leverage the distinct capabilities of each actor. Planning in SAS must address the challenge of transferring control quickly, safely, and smoothly back-and-forth between the agent and the human. We formally define SAS and the requirements to guarantee that the controlling entities are always able to act competently. We then consider applying the model to Semi-Autonomous VEhicles (SAVE), using a hierarchical approach in which micro-level transfer-of-control actions are governed by a high-fidelity POMDP model. Macro-level path planning in our hierarchical approach is performed by solving a Stochastic Shortest Path (SSP) problem. We analyze the integrated model and show that it provides the required guarantees. Finally, we test the SAVE model using real-world road data from Open Street Map (OSM) within 10 cities, showing the benefits of the collaboration between the agent and human. |
Luis Enrique Pineda; Kyle Hollins Wray; Shlomo Zilberstein Fast SSP Solvers Using Short-Sighted Labeling Conference IJCAI Workshop on Goal Reasoning, New York, NY, 2016. @conference{SZ:PWZijcai16ws,
title = {Fast SSP Solvers Using Short-Sighted Labeling},
author = {Luis Enrique Pineda and Kyle Hollins Wray and Shlomo Zilberstein},
year = {2016},
date = {2016-01-01},
booktitle = {IJCAI Workshop on Goal Reasoning},
address = {New York, NY},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
|
Richard G Freedman; Shlomo Zilberstein Using Metadata to Automate Interpretations of Unsupervised Learning-Derived Clusters Conference IJCAI Workshop on Human is More Than a Labeler, New York, NY, 2016. @conference{SZ:FZijcai16ws2,
title = {Using Metadata to Automate Interpretations of Unsupervised Learning-Derived Clusters},
author = {Richard G Freedman and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/FZijcai16ws2.pdf},
year = {2016},
date = {2016-01-01},
booktitle = {IJCAI Workshop on Human is More Than a Labeler},
address = {New York, NY},
abstract = {Unsupervised machine learning methods are useful for identifying clusters of similar inputs with re- spect to some criteria and giving the inputs within each cluster the same label. However, the results of many such methods rely on parameter choices that can alter the derived classification labels for each input. Verification methods for determining the quality of clusters often rely on human intuition, but this is not always an easy task depending on the format of the inputs and finding the correct relationship that the algorithm used.We present an approach to assist human verification of the unsupervised learning algorithms' classification choices through the use of metadata describing the inputs to be clustered. When the metadata measures the relevance of each input to human-interpretable features, we show how a similar measurement of relevance to human-interpretable features can be de- rived to describe the unsupervised learning algorithm's choices of clusters. An example demonstrating how it evaluates previous work with activity recognition via topic models is provided in addition to propositions of other uses for the metadata.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Unsupervised machine learning methods are useful for identifying clusters of similar inputs with re- spect to some criteria and giving the inputs within each cluster the same label. However, the results of many such methods rely on parameter choices that can alter the derived classification labels for each input. Verification methods for determining the quality of clusters often rely on human intuition, but this is not always an easy task depending on the format of the inputs and finding the correct relationship that the algorithm used.We present an approach to assist human verification of the unsupervised learning algorithms' classification choices through the use of metadata describing the inputs to be clustered. When the metadata measures the relevance of each input to human-interpretable features, we show how a similar measurement of relevance to human-interpretable features can be de- rived to describe the unsupervised learning algorithm's choices of clusters. An example demonstrating how it evaluates previous work with activity recognition via topic models is provided in addition to propositions of other uses for the metadata. |
Kyle Hollins Wray; Dirk Ruiken; Roderic A Grupen; Shlomo Zilberstein Log-Space Harmonic Function Path Planning Conference Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, South Korea, 2016. @conference{SZ:WRGZiros16,
title = {Log-Space Harmonic Function Path Planning},
author = {Kyle Hollins Wray and Dirk Ruiken and Roderic A Grupen and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/WRGZiros16.pdf},
year = {2016},
date = {2016-01-01},
booktitle = {Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
pages = {1511--1516},
address = {Daejeon, South Korea},
abstract = {We propose a log-space solution for robotic path planning with harmonic functions that solves the long-standing numerical precision problem. We prove that this algorithm: (1) performs the correct computations in log-space, (2) returns the true equivalent path using the log-space mapping, and (3) has a strong error bound given its convergence criterion. We evaluate the algorithm on 7 problem domains. A Graphics Processing Unit (GPU) implementation is also shown to greatly improve performance. We also provide an open source library entitled epic with extensive ROS support and demonstrate this method on a real humanoid robot: the uBot-6. Experiments demonstrate that the log-space solution rapidly produces smooth obstacle-avoiding trajectories, and supports planning in exponentially larger real-world robotic applications.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
We propose a log-space solution for robotic path planning with harmonic functions that solves the long-standing numerical precision problem. We prove that this algorithm: (1) performs the correct computations in log-space, (2) returns the true equivalent path using the log-space mapping, and (3) has a strong error bound given its convergence criterion. We evaluate the algorithm on 7 problem domains. A Graphics Processing Unit (GPU) implementation is also shown to greatly improve performance. We also provide an open source library entitled epic with extensive ROS support and demonstrate this method on a real humanoid robot: the uBot-6. Experiments demonstrate that the log-space solution rapidly produces smooth obstacle-avoiding trajectories, and supports planning in exponentially larger real-world robotic applications. |
Richard G Freedman; Shlomo Zilberstein Safety in AI-HRI: Challenges Complementing User Experience Quality Conference AAAI Fall Symposium on Artificial Intelligence and Human-Robot Interaction (AI-HRI), Arlington, Virginia, 2016. @conference{SZ:FZfall16,
title = {Safety in AI-HRI: Challenges Complementing User Experience Quality},
author = {Richard G Freedman and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/FZfall16.pdf},
year = {2016},
date = {2016-01-01},
booktitle = {AAAI Fall Symposium on Artificial Intelligence and Human-Robot Interaction (AI-HRI)},
address = {Arlington, Virginia},
abstract = {Contemporary research in human-robot interaction (HRI) predominantly focuses on the user's experience while con- trolling a robot. However, with the increased deployment of artificial intelligence (AI) techniques, robots are quickly be- coming more autonomous in both academic and industrial experimental settings. In addition to improving the user's inter- active experience with AI-operated robots through personalization, dialogue, emotions, and dynamic behavior, there is also a growing need to consider the safety of the interaction. AI may not account for the user's less likely responses, making it possible for an unaware user to be injured by the robot if they have a collision. Issues of trust and acceptance may also come into play if users cannot always understand the robot's thought process, creating a potential for emotional harm. We identify challenges that will need to be addressed in safe AI-HRI and provide an overview of approaches to consider for them, many stemming from the contemporary research.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Contemporary research in human-robot interaction (HRI) predominantly focuses on the user's experience while con- trolling a robot. However, with the increased deployment of artificial intelligence (AI) techniques, robots are quickly be- coming more autonomous in both academic and industrial experimental settings. In addition to improving the user's inter- active experience with AI-operated robots through personalization, dialogue, emotions, and dynamic behavior, there is also a growing need to consider the safety of the interaction. AI may not account for the user's less likely responses, making it possible for an unaware user to be injured by the robot if they have a collision. Issues of trust and acceptance may also come into play if users cannot always understand the robot's thought process, creating a potential for emotional harm. We identify challenges that will need to be addressed in safe AI-HRI and provide an overview of approaches to consider for them, many stemming from the contemporary research. |
2015
|
Ronen Brafman; Carmel Domshlak; Patrik Haslum; Shlomo Zilberstein (Ed.) Proceedings of the 25th International Conference on Automated Planning and Scheduling Proceedings AAAI, Jerusalem, Israel, 2015, ISBN: 978-1-57735-731-5. @proceedings{SZ:BDHZicaps15,
title = {Proceedings of the 25th International Conference on Automated Planning and Scheduling},
editor = {Ronen Brafman and Carmel Domshlak and Patrik Haslum and Shlomo Zilberstein},
url = {http://www.aaai.org/Library/ICAPS/icaps15contents.php},
isbn = {978-1-57735-731-5},
year = {2015},
date = {2015-01-01},
publisher = {AAAI},
address = {Jerusalem, Israel},
keywords = {},
pubstate = {published},
tppubtype = {proceedings}
}
|
Akshat Kumar; Shlomo Zilberstein; Marc Toussaint Probabilistic Inference Techniques for Scalable Multiagent Decision Making Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 53, pp. 223–270, 2015. @article{SZ:KZTjair15,
title = {Probabilistic Inference Techniques for Scalable Multiagent Decision Making},
author = {Akshat Kumar and Shlomo Zilberstein and Marc Toussaint},
url = {http://rbr.cs.umass.edu/shlomo/papers/KZTjair15.pdf},
doi = {10.1613/jair.4649},
year = {2015},
date = {2015-01-01},
journal = {Journal of Artificial Intelligence Research (JAIR)},
volume = {53},
pages = {223--270},
abstract = {Decentralized POMDPs provide an expressive framework for multiagent sequential decision making. However, the complexity of these models -- NEXP-Complete even for two agents -- has limited their scalability. We present a promising new class of approximation algorithms by developing novel connections between multiagent planning and machine learning. We show how the multiagent planning problem can be reformulated as inference in a mixture of dynamic Bayesian networks (DBNs). This planning-as-inference approach paves the way for the application of efficient inference techniques in DBNs to multiagent decision making. To further improve scalability, we identify certain conditions that are sufficient to extend the approach to multiagent systems with dozens of agents. Specifically, we show that the necessary inference within the expectation-maximization framework can be decomposed into processes that often involve a small subset of agents, thereby facilitating scalability. We further show that a number of existing multiagent planning models satisfy these conditions. Experiments on large planning benchmarks confirm the benefits of our approach in terms of runtime and scalability with respect to existing techniques.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Decentralized POMDPs provide an expressive framework for multiagent sequential decision making. However, the complexity of these models -- NEXP-Complete even for two agents -- has limited their scalability. We present a promising new class of approximation algorithms by developing novel connections between multiagent planning and machine learning. We show how the multiagent planning problem can be reformulated as inference in a mixture of dynamic Bayesian networks (DBNs). This planning-as-inference approach paves the way for the application of efficient inference techniques in DBNs to multiagent decision making. To further improve scalability, we identify certain conditions that are sufficient to extend the approach to multiagent systems with dozens of agents. Specifically, we show that the necessary inference within the expectation-maximization framework can be decomposed into processes that often involve a small subset of agents, thereby facilitating scalability. We further show that a number of existing multiagent planning models satisfy these conditions. Experiments on large planning benchmarks confirm the benefits of our approach in terms of runtime and scalability with respect to existing techniques. |
Kyle Hollins Wray; Shlomo Zilberstein; Abdel-Illah Mouaddib Multi-Objective MDPs with Conditional Lexicographic Reward Preferences Conference Proceedings of the 29th Conference on Artificial Intelligence (AAAI), Austin, Texas, 2015. @conference{SZ:WZMaaai15,
title = {Multi-Objective MDPs with Conditional Lexicographic Reward Preferences},
author = {Kyle Hollins Wray and Shlomo Zilberstein and Abdel-Illah Mouaddib},
url = {http://rbr.cs.umass.edu/shlomo/papers/WZMaaai15.pdf},
year = {2015},
date = {2015-01-01},
booktitle = {Proceedings of the 29th Conference on Artificial Intelligence (AAAI)},
pages = {3418--3424},
address = {Austin, Texas},
abstract = {Sequential decision problems that involve multiple objectives are prevalent. Consider for example a driver of a semi-autonomous car who may want to optimize competing objectives such as travel time and the effort associated with manual driving. We introduce a rich model called Lexicographic MDP (LMDP) and a corresponding planning algorithm called LVI that generalize previous work by allowing for conditional lexicographic preferences with slack. We analyze the convergence characteristics of LVI and establish its game theoretic properties. The performance of LVI in practice is tested within a realistic benchmark problem in the domain of semi-autonomous driving. Finally, we demonstrate how GPU-based optimization can improve the scalability of LVI and other value iteration algorithms for MDPs.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Sequential decision problems that involve multiple objectives are prevalent. Consider for example a driver of a semi-autonomous car who may want to optimize competing objectives such as travel time and the effort associated with manual driving. We introduce a rich model called Lexicographic MDP (LMDP) and a corresponding planning algorithm called LVI that generalize previous work by allowing for conditional lexicographic preferences with slack. We analyze the convergence characteristics of LVI and establish its game theoretic properties. The performance of LVI in practice is tested within a realistic benchmark problem in the domain of semi-autonomous driving. Finally, we demonstrate how GPU-based optimization can improve the scalability of LVI and other value iteration algorithms for MDPs. |
Shlomo Zilberstein Building Strong Semi-Autonomous Systems Conference Proceedings of the 29th Conference on Artificial Intelligence (AAAI), Austin, Texas, 2015. @conference{SZ:Zaaai15,
title = {Building Strong Semi-Autonomous Systems},
author = {Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/Zaaai15.pdf},
year = {2015},
date = {2015-01-01},
booktitle = {Proceedings of the 29th Conference on Artificial Intelligence (AAAI)},
pages = {4088--4092},
address = {Austin, Texas},
abstract = {The vision of populating the world with autonomous systems that reduce human labor and improve safety is gradually becoming a reality. Autonomous systems have changed the way space exploration is conducted and are beginning to transform everyday life with a range of household products. In many areas, however, there are considerable barriers to the deployment of fully autonomous systems. We refer to systems that require some degree of human intervention in order to complete a task as semi-autonomous systems. We examine the broad rationale for semi-autonomy and define basic properties of such systems. Accounting for the human in the loop presents a considerable challenge for current planning techniques. We examine various design choices in the development of semi-autonomous systems and their implications on planning and execution. Finally, we discuss fruitful research directions for advancing the science of semi-autonomy.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
The vision of populating the world with autonomous systems that reduce human labor and improve safety is gradually becoming a reality. Autonomous systems have changed the way space exploration is conducted and are beginning to transform everyday life with a range of household products. In many areas, however, there are considerable barriers to the deployment of fully autonomous systems. We refer to systems that require some degree of human intervention in order to complete a task as semi-autonomous systems. We examine the broad rationale for semi-autonomy and define basic properties of such systems. Accounting for the human in the loop presents a considerable challenge for current planning techniques. We examine various design choices in the development of semi-autonomous systems and their implications on planning and execution. Finally, we discuss fruitful research directions for advancing the science of semi-autonomy. |
Siddharth Srivastava; Shlomo Zilberstein; Abhishek Gupta; Pieter Abbeel; Stuart J Russell Tractability of Planning with Loops Conference Proceedings of the 29th Conference on Artificial Intelligence (AAAI), Austin, Texas, 2015. @conference{SZ:SZGARaaai15,
title = {Tractability of Planning with Loops},
author = {Siddharth Srivastava and Shlomo Zilberstein and Abhishek Gupta and Pieter Abbeel and Stuart J Russell},
url = {http://rbr.cs.umass.edu/shlomo/papers/SZGARaaai15.pdf},
year = {2015},
date = {2015-01-01},
booktitle = {Proceedings of the 29th Conference on Artificial Intelligence (AAAI)},
pages = {3393--3401},
address = {Austin, Texas},
abstract = {We create a unified framework for analyzing and synthesizing plans with loops for solving problems with non-deterministic numeric effects and a limited form of partial observability. Three different action models -- with deterministic, qualitative non-deterministic and Boolean non-deterministic semantics -- are handled using a single abstract representation. We establish the conditions under which the correctness and termination of solutions, represented as abstract policies, can be verified. We also examine the feasibility of learning abstract policies from examples. We demonstrate our techniques on several planning problems and show that they apply to challenging real-world tasks such as doing the laundry with a PR2 robot. These results resolve a number of open questions about planning with loops and facilitate the development of new algorithms and applications.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
We create a unified framework for analyzing and synthesizing plans with loops for solving problems with non-deterministic numeric effects and a limited form of partial observability. Three different action models -- with deterministic, qualitative non-deterministic and Boolean non-deterministic semantics -- are handled using a single abstract representation. We establish the conditions under which the correctness and termination of solutions, represented as abstract policies, can be verified. We also examine the feasibility of learning abstract policies from examples. We demonstrate our techniques on several planning problems and show that they apply to challenging real-world tasks such as doing the laundry with a PR2 robot. These results resolve a number of open questions about planning with loops and facilitate the development of new algorithms and applications. |
Hee-Tae Jung; Richard G Freedman; Tammie Foster; Yu-Kyong Choe; Shlomo Zilberstein; Roderic A Grupen Learning Therapy Strategies from Demonstration Using Latent Dirichlet Allocation Conference Proceedings of the 20th ACM Conference on Intelligent User Interfaces (IUI), Atlanta, Georgia, 2015. @conference{SZ:JFFCZGiui15,
title = {Learning Therapy Strategies from Demonstration Using Latent Dirichlet Allocation},
author = {Hee-Tae Jung and Richard G Freedman and Tammie Foster and Yu-Kyong Choe and Shlomo Zilberstein and Roderic A Grupen},
url = {http://rbr.cs.umass.edu/shlomo/papers/JFFCZGiui15.pdf},
doi = {10.1145/2678025.2701403},
year = {2015},
date = {2015-01-01},
booktitle = {Proceedings of the 20th ACM Conference on Intelligent User Interfaces (IUI)},
pages = {432--436},
address = {Atlanta, Georgia},
abstract = {The use of robots in stroke rehabilitation has become a popular trend in rehabilitation robotics. However, despite the acknowledged value of customized service for individual patients, research on programming adaptive therapy for individual patients has received little attention. The goal of the current study is to model tele-therapy sessions in the form of a generative process for autonomous therapy that approximate the demonstrations of the therapist. The resulting autonomous programs for therapy may imitate the strategy that the therapist might have employed and reinforce therapeutic exercises between tele-therapy sessions. We propose to encode the therapist's decision criteria in terms of the patient's motor performance features. Specifically, in this work, we apply Latent Dirichlet Allocation on the batch data collected during tele-therapy sessions between a single stroke patient and a single therapist. Using the resulting models, the therapeutic exercise targets are generated and are verified with the same therapist who generated the data.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
The use of robots in stroke rehabilitation has become a popular trend in rehabilitation robotics. However, despite the acknowledged value of customized service for individual patients, research on programming adaptive therapy for individual patients has received little attention. The goal of the current study is to model tele-therapy sessions in the form of a generative process for autonomous therapy that approximate the demonstrations of the therapist. The resulting autonomous programs for therapy may imitate the strategy that the therapist might have employed and reinforce therapeutic exercises between tele-therapy sessions. We propose to encode the therapist's decision criteria in terms of the patient's motor performance features. Specifically, in this work, we apply Latent Dirichlet Allocation on the batch data collected during tele-therapy sessions between a single stroke patient and a single therapist. Using the resulting models, the therapeutic exercise targets are generated and are verified with the same therapist who generated the data. |
Akshat Kumar; Shlomo Zilberstein History-Based Controller Design and Optimization for Partially Observable MDPs Conference Proceedings of the 25th International Conference on Automated Planning and Scheduling (ICAPS), Jerusalem, Israel, 2015. @conference{SZ:KZicaps15,
title = {History-Based Controller Design and Optimization for Partially Observable MDPs},
author = {Akshat Kumar and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/KZicaps15.pdf},
year = {2015},
date = {2015-01-01},
booktitle = {Proceedings of the 25th International Conference on Automated Planning and Scheduling (ICAPS)},
pages = {156--164},
address = {Jerusalem, Israel},
abstract = {Partially observable MDPs provide an elegant framework for sequential decision making. Finite-state controllers (FSCs) are often used to represent policies for infinite-horizon problems as they offer a compact representation, simple-to-execute plans, and adjustable tradeoff between computational complexity and policy size. We develop novel connections between optimizing FSCs for POMDPs and the dual linear program for MDPs. Building on that, we present a dual mixed integer linear program (MIP) for optimizing FSCs. To assign well-defined meaning to FSC nodes as well as aid in policy search, we show how to associate history-based features with each FSC node. Using this representation, we address another challenging problem, that of iteratively deciding which nodes to add to FSC to get a better policy. Using an efficient off-the-shelf MIP solver, we show that this new approach can find compact near-optimal FSCs for several large benchmark domains, and is competitive with previous best approaches.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Partially observable MDPs provide an elegant framework for sequential decision making. Finite-state controllers (FSCs) are often used to represent policies for infinite-horizon problems as they offer a compact representation, simple-to-execute plans, and adjustable tradeoff between computational complexity and policy size. We develop novel connections between optimizing FSCs for POMDPs and the dual linear program for MDPs. Building on that, we present a dual mixed integer linear program (MIP) for optimizing FSCs. To assign well-defined meaning to FSC nodes as well as aid in policy search, we show how to associate history-based features with each FSC node. Using this representation, we address another challenging problem, that of iteratively deciding which nodes to add to FSC to get a better policy. Using an efficient off-the-shelf MIP solver, we show that this new approach can find compact near-optimal FSCs for several large benchmark domains, and is competitive with previous best approaches. |
Abdel-Illah Mouaddib; Laurent Jeanpierre; Shlomo Zilberstein Handling Advice in MDPs for Semi-Autonomous Systems Conference ICAPS Workshop on Planning and Robotics (PlanRob), Jerusalem, Israel, 2015. @conference{SZ:MJZplanrob15,
title = {Handling Advice in MDPs for Semi-Autonomous Systems},
author = {Abdel-Illah Mouaddib and Laurent Jeanpierre and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/MJZplanrob15.pdf},
year = {2015},
date = {2015-01-01},
booktitle = {ICAPS Workshop on Planning and Robotics (PlanRob)},
address = {Jerusalem, Israel},
abstract = {This paper proposes an effective new model for decision making in situations where full autonomy is not feasible due to the inability to fully model and reason about the domain. To overcome this limitation, we consider a human operator who can supervise the system and guide its operation by providing high-level advice. We define a rich representation for advice and describe an effective algorithm for generating a new policy that conforms to the given advice. Advice is designed to improve the efficiency and safety of the system by imposing constraints on state visitation (either encouraging or discouraging the system to visit certain states). Coupled with the standard reward maximization criterion for MDPs, advice poses a complex multi-criteria decision problem. We present and analyze an effective algorithm for optimizing the policy in the presence of advice.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
This paper proposes an effective new model for decision making in situations where full autonomy is not feasible due to the inability to fully model and reason about the domain. To overcome this limitation, we consider a human operator who can supervise the system and guide its operation by providing high-level advice. We define a rich representation for advice and describe an effective algorithm for generating a new policy that conforms to the given advice. Advice is designed to improve the efficiency and safety of the system by imposing constraints on state visitation (either encouraging or discouraging the system to visit certain states). Coupled with the standard reward maximization criterion for MDPs, advice poses a complex multi-criteria decision problem. We present and analyze an effective algorithm for optimizing the policy in the presence of advice. |
Kyle Hollins Wray; Shlomo Zilberstein Multi-Objective POMDPs with Lexicographic Reward Preferences Conference Proceedings of the 24th International Joint Conference on Artificial Intelligence (IJCAI), Buenos Aires, Argentina, 2015. @conference{SZ:KZijcai15,
title = {Multi-Objective POMDPs with Lexicographic Reward Preferences},
author = {Kyle Hollins Wray and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/KZijcai15.pdf},
year = {2015},
date = {2015-01-01},
booktitle = {Proceedings of the 24th International Joint Conference on Artificial Intelligence (IJCAI)},
pages = {1719--1725},
address = {Buenos Aires, Argentina},
abstract = {We propose a model, Lexicographic Partially Observable Markov Decision Process (LPOMDP), which extends POMDPs with lexicographic preferences over multiple value functions. It allows for slack--slightly less-than-optimal values--for higher-priority preferences to facilitate improvement in lower-priority value functions. Many real life situations are naturally captured by LPOMDPs with slack. We consider a semi-autonomous driving scenario in which time spent on the road is minimized, while maximizing time spent driving autonomously. We propose two solutions to LPOMDPs--Lexicographic Value Iteration (LVI) and Lexicographic Point-Based Value Iteration (LPBVI), establishing convergence results and correctness within strong slack bounds. We test the algorithms using real-world road data provided by Open Street Map (OSM) within 10 major cities. Finally, we present GPU-based optimizations for point-based solvers, demonstrating that their application enables us to quickly solve vastly larger LPOMDPs and other variations of POMDPs.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
We propose a model, Lexicographic Partially Observable Markov Decision Process (LPOMDP), which extends POMDPs with lexicographic preferences over multiple value functions. It allows for slack--slightly less-than-optimal values--for higher-priority preferences to facilitate improvement in lower-priority value functions. Many real life situations are naturally captured by LPOMDPs with slack. We consider a semi-autonomous driving scenario in which time spent on the road is minimized, while maximizing time spent driving autonomously. We propose two solutions to LPOMDPs--Lexicographic Value Iteration (LVI) and Lexicographic Point-Based Value Iteration (LPBVI), establishing convergence results and correctness within strong slack bounds. We test the algorithms using real-world road data provided by Open Street Map (OSM) within 10 major cities. Finally, we present GPU-based optimizations for point-based solvers, demonstrating that their application enables us to quickly solve vastly larger LPOMDPs and other variations of POMDPs. |
Xiaojian Wu; Daniel Sheldon; Shlomo Zilberstein Fast Combinatorial Algorithm for Optimizing the Spread of Cascades Conference Proceedings of the 24th International Joint Conference on Artificial Intelligence (IJCAI), Buenos Aires, Argentina, 2015. @conference{SZ:WSZijcai15,
title = {Fast Combinatorial Algorithm for Optimizing the Spread of Cascades},
author = {Xiaojian Wu and Daniel Sheldon and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/WSZijcai15.pdf},
year = {2015},
date = {2015-01-01},
booktitle = {Proceedings of the 24th International Joint Conference on Artificial Intelligence (IJCAI)},
pages = {2655--2661},
address = {Buenos Aires, Argentina},
abstract = {We address a spatial conservation planning problem in which the planner purchases a budget-constrained set of land parcels in order to maximize the expected spread of a population of an endangered species. Existing techniques based on the sample average approximation scheme and standard integer programming methods have high complexity and limited scalability. We propose a fast combinatorial optimization algorithm using Lagrangian relaxation and primal-dual techniques to solve the problem approximately. The algorithm provides a new way to address a range of conservation planning and scheduling problems. On the Red-cockaded Woodpecker data, our algorithm produces near optimal solutions and runs significantly faster than a standard mixed integer program solver. Compared with a greedy baseline, the solution quality is comparable or better, but our algorithm is 10--30 times faster. On synthetic problems that do not exhibit submodularity, our algorithm significantly outperforms the greedy baseline.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
We address a spatial conservation planning problem in which the planner purchases a budget-constrained set of land parcels in order to maximize the expected spread of a population of an endangered species. Existing techniques based on the sample average approximation scheme and standard integer programming methods have high complexity and limited scalability. We propose a fast combinatorial optimization algorithm using Lagrangian relaxation and primal-dual techniques to solve the problem approximately. The algorithm provides a new way to address a range of conservation planning and scheduling problems. On the Red-cockaded Woodpecker data, our algorithm produces near optimal solutions and runs significantly faster than a standard mixed integer program solver. Compared with a greedy baseline, the solution quality is comparable or better, but our algorithm is 10--30 times faster. On synthetic problems that do not exhibit submodularity, our algorithm significantly outperforms the greedy baseline. |
Hee-Tae Jung; Richard G Freedman; Takeshi Takahashi; Jay Ming Wong; Shlomo Zilberstein; Roderic A Grupen; Yu-kyong Choe Adaptive Therapy Strategies: Efficacy and Learning Framework Conference Proceedings of the IEEE/RAS-EMBS International Conference on Rehabilitation Robotics, Singapore, 2015. @conference{SZ:JFTWZGCicorr15,
title = {Adaptive Therapy Strategies: Efficacy and Learning Framework},
author = {Hee-Tae Jung and Richard G Freedman and Takeshi Takahashi and Jay Ming Wong and Shlomo Zilberstein and Roderic A Grupen and Yu-kyong Choe},
url = {http://rbr.cs.umass.edu/shlomo/papers/JFTWZGCicorr15.pdf},
year = {2015},
date = {2015-01-01},
booktitle = {Proceedings of the IEEE/RAS-EMBS International Conference on Rehabilitation Robotics},
address = {Singapore},
abstract = {This paper considers a data-driven framework to model target selection strategies using runtime kinematic parameters of individual patients. These models can be used to select new exercise targets that conform with the decision criteria of the therapist. We present the results from a single-subject case study with a manually written target selection function. Motivated by promising results, we propose a framework to learning customized/adaptive therapy models for individual patients. Through the data collected from a normally functioning adult, we demonstrate that it is feasible to model varying strategies from the demonstration of target selection.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
This paper considers a data-driven framework to model target selection strategies using runtime kinematic parameters of individual patients. These models can be used to select new exercise targets that conform with the decision criteria of the therapist. We present the results from a single-subject case study with a manually written target selection function. Motivated by promising results, we propose a framework to learning customized/adaptive therapy models for individual patients. Through the data collected from a normally functioning adult, we demonstrate that it is feasible to model varying strategies from the demonstration of target selection. |
Richard G Freedman; Shlomo Zilberstein Automated Interpretations of Unsupervised Learning-Derived Clusters for Activity Recognition Conference Ro-Man Workshop on Learning for Human-Robot Collaboration, Kobe, Japan, 2015. @conference{SZ:FZromanW15,
title = {Automated Interpretations of Unsupervised Learning-Derived Clusters for Activity Recognition},
author = {Richard G Freedman and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/FZromanW15.pdf},
year = {2015},
date = {2015-01-01},
booktitle = {Ro-Man Workshop on Learning for Human-Robot Collaboration},
address = {Kobe, Japan},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
|
Luis Pineda; Takeshi Takahashi; Hee-Tae Jung; Shlomo Zilberstein; Roderic A Grupen Continual Planning for Search and Rescue Robots Conference Proceedings of the IEEE-RAS 15th International Conference on Humanoid Robots, Seoul, Korea, 2015. @conference{SZ:PTJZGhuman15,
title = {Continual Planning for Search and Rescue Robots},
author = {Luis Pineda and Takeshi Takahashi and Hee-Tae Jung and Shlomo Zilberstein and Roderic A Grupen},
url = {http://rbr.cs.umass.edu/shlomo/papers/PTJZGhuman15.pdf},
doi = {10.1109/HUMANOIDS.2015.7363542},
year = {2015},
date = {2015-01-01},
booktitle = {Proceedings of the IEEE-RAS 15th International Conference on Humanoid Robots},
pages = {243--248},
address = {Seoul, Korea},
abstract = {The deployment of robots for emergency response tasks such as search and rescue is a promising application of robotics with growing importance. Given the perilous nature of these tasks, autonomous robot operation is highly desirable in order to reduce the risk imposed on the human rescue team. While much work has been done on creating robotic systems that can be deployed for search and rescue, limited work has been devoted to devise efficient real-time automated planning algorithms for these tasks. In this work, we present REDHI, an efficient algorithm for solving probabilistic models of complex problems such as search and rescue. We evaluate our algorithm on the search and rescue problem using both an abstract domain representation and a semi-realistic simulator of an existing robot system. The results show that REDHI can obtain near optimal performance with negligible planning time.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
The deployment of robots for emergency response tasks such as search and rescue is a promising application of robotics with growing importance. Given the perilous nature of these tasks, autonomous robot operation is highly desirable in order to reduce the risk imposed on the human rescue team. While much work has been done on creating robotic systems that can be deployed for search and rescue, limited work has been devoted to devise efficient real-time automated planning algorithms for these tasks. In this work, we present REDHI, an efficient algorithm for solving probabilistic models of complex problems such as search and rescue. We evaluate our algorithm on the search and rescue problem using both an abstract domain representation and a semi-realistic simulator of an existing robot system. The results show that REDHI can obtain near optimal performance with negligible planning time. |
Richard G Freedman; Hee-Tae Jung; Shlomo Zilberstein Temporal and Object Relations in Unsupervised Plan and Activity Recognition Conference AAAI Fall Symposium on Artificial Intelligence and Human-Robot Interaction (AI-HRI), Arlington, Virginia, 2015. @conference{SZ:FJZfall15,
title = {Temporal and Object Relations in Unsupervised Plan and Activity Recognition},
author = {Richard G Freedman and Hee-Tae Jung and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/FJZfall15.pdf},
year = {2015},
date = {2015-01-01},
booktitle = {AAAI Fall Symposium on Artificial Intelligence and Human-Robot Interaction (AI-HRI)},
address = {Arlington, Virginia},
abstract = {We consider ways to improve the performance of unsupervised plan and activity recognition techniques by considering temporal and object relations in addition to postural data. Temporal relationships can help recognize activities with cyclic structure and are often implicit because plans have degrees of ordering actions. Relations with objects can help disambiguate observed activities that otherwise share a user's posture and position. We develop and investigate graphical models that extend the popular latent Dirichlet allocation approach with temporal and object relations, examine the relative performance and runtime trade-offs using a standard dataset, and consider the cost/benefit trade-offs these extensions offer in the context of human-robot and human- computer interaction.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
We consider ways to improve the performance of unsupervised plan and activity recognition techniques by considering temporal and object relations in addition to postural data. Temporal relationships can help recognize activities with cyclic structure and are often implicit because plans have degrees of ordering actions. Relations with objects can help disambiguate observed activities that otherwise share a user's posture and position. We develop and investigate graphical models that extend the popular latent Dirichlet allocation approach with temporal and object relations, examine the relative performance and runtime trade-offs using a standard dataset, and consider the cost/benefit trade-offs these extensions offer in the context of human-robot and human- computer interaction. |
Luis Pineda; Kyle Hollins Wray; Shlomo Zilberstein Revisiting Multi-Objective MDPs with Relaxed Lexicographic Preferences Conference AAAI Fall Symposium on Sequential Decision Making for Intelligent Agents (SDMIA), Arlington, Virginia, 2015. @conference{SZ:PWZfall15,
title = {Revisiting Multi-Objective MDPs with Relaxed Lexicographic Preferences},
author = {Luis Pineda and Kyle Hollins Wray and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/PWZfall15.pdf},
year = {2015},
date = {2015-01-01},
booktitle = {AAAI Fall Symposium on Sequential Decision Making for Intelligent Agents (SDMIA)},
address = {Arlington, Virginia},
abstract = {We consider stochastic planning problems that involve multiple objectives such as minimizing task completion time and energy consumption. These problems can be modeled as multi-objective Markov decision processes (MOMDPs), an extension of the widely-used MDP model to handle problems involving multiple value functions. We focus on a subclass of MOMDPs in which the objectives have a relaxed lexicographic structure, allowing an agent to seek improvement in a lower-priority objective when the impact on a higher- priority objective is within some small given tolerance. We examine the relationship between this class of problems and constrained MDPs, showing that the latter offer an alternative solution method with strong guarantees. We show empirically that a recently introduced algorithm for MOMDPs may not offer the same strong guarantees, but it does perform well in practice.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
We consider stochastic planning problems that involve multiple objectives such as minimizing task completion time and energy consumption. These problems can be modeled as multi-objective Markov decision processes (MOMDPs), an extension of the widely-used MDP model to handle problems involving multiple value functions. We focus on a subclass of MOMDPs in which the objectives have a relaxed lexicographic structure, allowing an agent to seek improvement in a lower-priority objective when the impact on a higher- priority objective is within some small given tolerance. We examine the relationship between this class of problems and constrained MDPs, showing that the latter offer an alternative solution method with strong guarantees. We show empirically that a recently introduced algorithm for MOMDPs may not offer the same strong guarantees, but it does perform well in practice. |
Kyle Hollins Wray; Shlomo Zilberstein A Parallel Point-Based POMDP Algorithm Leveraging GPUs Conference AAAI Fall Symposium on Sequential Decision Making for Intelligent Agents (SDMIA), Arlington, Virginia, 2015. @conference{SZ:WZfall15,
title = {A Parallel Point-Based POMDP Algorithm Leveraging GPUs},
author = {Kyle Hollins Wray and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/WZfall15.pdf},
year = {2015},
date = {2015-01-01},
booktitle = {AAAI Fall Symposium on Sequential Decision Making for Intelligent Agents (SDMIA)},
address = {Arlington, Virginia},
abstract = {We parallelize the Point-Based Value Iteration (PBVI) algorithm, which approximates the solution to Partially Observable Markov Decision Processes (POMDPs), using a Graph- ics Processing Unit (GPU). We detail additional optimizations, such as leveraging the bounded size of non-zero values over all belief point vectors, usable by serial and parallel algorithms. We compare serial (CPU) and parallel (GPU) implementations on 10 distinct problem domains, and demonstrate that our approach provides an order of magnitude improvement.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
We parallelize the Point-Based Value Iteration (PBVI) algorithm, which approximates the solution to Partially Observable Markov Decision Processes (POMDPs), using a Graph- ics Processing Unit (GPU). We detail additional optimizations, such as leveraging the bounded size of non-zero values over all belief point vectors, usable by serial and parallel algorithms. We compare serial (CPU) and parallel (GPU) implementations on 10 distinct problem domains, and demonstrate that our approach provides an order of magnitude improvement. |
Xiaojian Wu; Daniel Sheldon; Shlomo Zilberstein Efficient Algorithms to Optimize Diffusion Processes under the Independent Cascade Model Conference NIPS Workshop on Networks in the Social and Information Sciences, Montreal, Quebec, 2015. @conference{SZ:WSZnipsW15,
title = {Efficient Algorithms to Optimize Diffusion Processes under the Independent Cascade Model},
author = {Xiaojian Wu and Daniel Sheldon and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/WSZnipsW15.pdf},
year = {2015},
date = {2015-01-01},
booktitle = {NIPS Workshop on Networks in the Social and Information Sciences},
address = {Montreal, Quebec},
abstract = {We study scalable algorithms to optimize diffusion processes under the Independent Cascade model. We consider a broad class of intervention actions, including selecting sources, raising the probability that the diffusion propagates from one node to another and changing the topology of networks to facilitate the diffusion. Optimizing the selection of such actions with a limited budget tends to be NP- hard and is neither submodular nor supermodular. We provide scalable algorithms for three different problem settings that range in terms of the strength of the assumptions we make about the model. The algorithms are very efficient (faster than a baseline greedy algorithm), producing high-quality solutions in several diffusion maximization problems in the area of computational sustainability and in some cases also have provable approximation guarantees. These techniques offer promising results that may be applied to diffusion optimization problems in social and information networks.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
We study scalable algorithms to optimize diffusion processes under the Independent Cascade model. We consider a broad class of intervention actions, including selecting sources, raising the probability that the diffusion propagates from one node to another and changing the topology of networks to facilitate the diffusion. Optimizing the selection of such actions with a limited budget tends to be NP- hard and is neither submodular nor supermodular. We provide scalable algorithms for three different problem settings that range in terms of the strength of the assumptions we make about the model. The algorithms are very efficient (faster than a baseline greedy algorithm), producing high-quality solutions in several diffusion maximization problems in the area of computational sustainability and in some cases also have provable approximation guarantees. These techniques offer promising results that may be applied to diffusion optimization problems in social and information networks. |
2014
|
Duc Thien Nguyen; William Yeoh; Hoong Chuin Lau; Shlomo Zilberstein; Chongjie Zhang Decentralized Multi-Agent Reinforcement Learning in Average-Reward Dynamic DCOPs Conference Proceedings of the 13th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Paris, France, 2014. @conference{SZ:NYLZZaamas14,
title = {Decentralized Multi-Agent Reinforcement Learning in Average-Reward Dynamic DCOPs},
author = {Duc Thien Nguyen and William Yeoh and Hoong Chuin Lau and Shlomo Zilberstein and Chongjie Zhang},
url = {http://rbr.cs.umass.edu/shlomo/papers/NYLZZaamas14.pdf},
year = {2014},
date = {2014-01-01},
booktitle = {Proceedings of the 13th International Conference on Autonomous Agents and Multiagent Systems (AAMAS)},
pages = {1341--1342},
address = {Paris, France},
abstract = {Researchers have introduced the Dynamic Distributed Constraint Optimization Problem (Dynamic DCOP) formulation to model dynamically changing multi-agent coordination problems, where a dynamic DCOP is a sequence of (static canonical) DCOPs, each partially different from the DCOP preceding it. Existing work typically assumes that the problem in each time step is decoupled from the problems in other time steps, which might not hold in some applications. In this paper, we introduce a new model, called Markovian Dynamic DCOPs (MD-DCOPs), where a DCOP is a function of the value assignments in the preceding DCOP. We also introduce a distributed reinforcement learning algorithm that balances exploration and exploitation to solve MD-DCOPs in an online manner.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Researchers have introduced the Dynamic Distributed Constraint Optimization Problem (Dynamic DCOP) formulation to model dynamically changing multi-agent coordination problems, where a dynamic DCOP is a sequence of (static canonical) DCOPs, each partially different from the DCOP preceding it. Existing work typically assumes that the problem in each time step is decoupled from the problems in other time steps, which might not hold in some applications. In this paper, we introduce a new model, called Markovian Dynamic DCOPs (MD-DCOPs), where a DCOP is a function of the value assignments in the preceding DCOP. We also introduce a distributed reinforcement learning algorithm that balances exploration and exploitation to solve MD-DCOPs in an online manner. |
Richard G Freedman; Hee-Tae Jung; Shlomo Zilberstein Plan and Activity Recognition from a Topic Modeling Perspective Conference Proceedings of the 24thInternational Conference on Automated Planning and Scheduling (ICAPS), Portsmouth, New Hampshire, 2014. @conference{SZ:FJZicaps14,
title = {Plan and Activity Recognition from a Topic Modeling Perspective},
author = {Richard G Freedman and Hee-Tae Jung and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/FJZicaps14.pdf},
year = {2014},
date = {2014-01-01},
booktitle = {Proceedings of the 24thInternational Conference on Automated Planning and Scheduling (ICAPS)},
pages = {360--364},
address = {Portsmouth, New Hampshire},
abstract = {We examine new ways to perform plan recognition (PR) using natural language processing (NLP) techniques. PR often focuses on the structural relationships between consecutive observations and ordered activities that comprise plans. However, NLP commonly treats text as a bag-of-words, omitting such structural relationships and using topic models to break down the distribution of concepts discussed in documents. In this paper, we examine an analogous treatment of plans as distributions of activities. We explore the application of Latent Dirichlet Allocation topic models to human skeletal data of plan execution traces obtained from a RGB-D sensor. This investigation focuses on representing the data as text and interpreting learned activities as a form of activity recognition (AR). Additionally, we explain how the system may perform PR. The initial empirical results suggest that such NLP methods can be useful in complex PR and AR tasks.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
We examine new ways to perform plan recognition (PR) using natural language processing (NLP) techniques. PR often focuses on the structural relationships between consecutive observations and ordered activities that comprise plans. However, NLP commonly treats text as a bag-of-words, omitting such structural relationships and using topic models to break down the distribution of concepts discussed in documents. In this paper, we examine an analogous treatment of plans as distributions of activities. We explore the application of Latent Dirichlet Allocation topic models to human skeletal data of plan execution traces obtained from a RGB-D sensor. This investigation focuses on representing the data as text and interpreting learned activities as a form of activity recognition (AR). Additionally, we explain how the system may perform PR. The initial empirical results suggest that such NLP methods can be useful in complex PR and AR tasks. |
Luis Pineda; Shlomo Zilberstein Planning Under Uncertainty Using Reduced Models: Revisiting Determinization Conference Proceedings of the 24th International Conference on Automated Planning and Scheduling (ICAPS), Portsmouth, New Hampshire, 2014. @conference{SZ:PZicaps14,
title = {Planning Under Uncertainty Using Reduced Models: Revisiting Determinization},
author = {Luis Pineda and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/PZicaps14.pdf},
year = {2014},
date = {2014-01-01},
booktitle = {Proceedings of the 24th International Conference on Automated Planning and Scheduling (ICAPS)},
pages = {217--225},
address = {Portsmouth, New Hampshire},
abstract = {We introduce a family of MDP reduced models characterized by two parameters: the maximum number of primary outcomes per action that are fully accounted for and the maximum number of occurrences of the remaining exceptional outcomes that are planned for in advance. Reduced models can be solved much faster using heuristic search algorithms such as LAO*, benefiting from the dramatic reduction in the number of reachable states. A commonly used determinization approach is a special case of this family of reductions, with one primary outcome per action and zero exceptional outcomes per plan. We present a framework to compute the benefits of planning with reduced models, relying on online planning when the number of exceptional outcomes exceeds the bound. Using this framework, we compare the performance of various reduced models and consider the challenge of generating good ones automatically. We show that each one of the dimensions--allowing more than one primary outcome or planning for some limited number of exceptions--could improve performance relative to standard determinization. The results place recent work on determinization in a broader context and lay the foundation for efficient and systematic exploration of the space of MDP model reductions.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
We introduce a family of MDP reduced models characterized by two parameters: the maximum number of primary outcomes per action that are fully accounted for and the maximum number of occurrences of the remaining exceptional outcomes that are planned for in advance. Reduced models can be solved much faster using heuristic search algorithms such as LAO*, benefiting from the dramatic reduction in the number of reachable states. A commonly used determinization approach is a special case of this family of reductions, with one primary outcome per action and zero exceptional outcomes per plan. We present a framework to compute the benefits of planning with reduced models, relying on online planning when the number of exceptional outcomes exceeds the bound. Using this framework, we compare the performance of various reduced models and consider the challenge of generating good ones automatically. We show that each one of the dimensions--allowing more than one primary outcome or planning for some limited number of exceptions--could improve performance relative to standard determinization. The results place recent work on determinization in a broader context and lay the foundation for efficient and systematic exploration of the space of MDP model reductions. |
Duc Thien Nguyen; William Yeoh; Hoong Chuin Lau; Shlomo Zilberstein; Chongjie Zhang Decentralized Multi-Agent Reinforcement Learning in Average-Reward Dynamic DCOPs Conference Proceedings of the 28th Conference on Artificial Intelligence (AAAI), Quebec City, Canada, 2014. @conference{SZ:NYLZZaaai14,
title = {Decentralized Multi-Agent Reinforcement Learning in Average-Reward Dynamic DCOPs},
author = {Duc Thien Nguyen and William Yeoh and Hoong Chuin Lau and Shlomo Zilberstein and Chongjie Zhang},
url = {http://rbr.cs.umass.edu/shlomo/papers/NYLZZaaai14.pdf},
year = {2014},
date = {2014-01-01},
booktitle = {Proceedings of the 28th Conference on Artificial Intelligence (AAAI)},
pages = {1447--1455},
address = {Quebec City, Canada},
abstract = {Researchers have introduced the Dynamic Distributed Constraint Optimization Problem (Dynamic DCOP) formulation to model dynamically changing multi-agent coordination problems, where a dynamic DCOP is a sequence of (static canonical) DCOPs, each partially different from the DCOP preceding it. Existing work typically assumes that the problem in each time step is decoupled from the problems in other time steps, which might not hold in some applications. Therefore, in this paper, we make the following contributions: (i) We introduce a new model, called Markovian Dynamic DCOPs (MD-DCOPs), where the DCOP in the next time step is a function of the value assignments in the current time step; (ii) We introduce two distributed reinforcement learning algorithms, the Distributed RVI Q-learning algorithm and the Distributed R-learning algorithm, that balance exploration and exploitation to solve MD-DCOPs in an online manner; and (iii) We empirically evaluate them against an existing multiarm bandit DCOP algorithm on dynamic DCOPs.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Researchers have introduced the Dynamic Distributed Constraint Optimization Problem (Dynamic DCOP) formulation to model dynamically changing multi-agent coordination problems, where a dynamic DCOP is a sequence of (static canonical) DCOPs, each partially different from the DCOP preceding it. Existing work typically assumes that the problem in each time step is decoupled from the problems in other time steps, which might not hold in some applications. Therefore, in this paper, we make the following contributions: (i) We introduce a new model, called Markovian Dynamic DCOPs (MD-DCOPs), where the DCOP in the next time step is a function of the value assignments in the current time step; (ii) We introduce two distributed reinforcement learning algorithms, the Distributed RVI Q-learning algorithm and the Distributed R-learning algorithm, that balance exploration and exploitation to solve MD-DCOPs in an online manner; and (iii) We empirically evaluate them against an existing multiarm bandit DCOP algorithm on dynamic DCOPs. |
Xiaojian Wu; Daniel Sheldon; Shlomo Zilberstein Rounded Dynamic Programming for Tree-Structured Stochastic Network Design Conference Proceedings of the 28th Conference on Artificial Intelligence (AAAI), Quebec City, Canada, 2014. @conference{SZ:WSZaaai14,
title = {Rounded Dynamic Programming for Tree-Structured Stochastic Network Design},
author = {Xiaojian Wu and Daniel Sheldon and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/WSZaaai14.pdf},
year = {2014},
date = {2014-01-01},
booktitle = {Proceedings of the 28th Conference on Artificial Intelligence (AAAI)},
pages = {479--485},
address = {Quebec City, Canada},
abstract = {We develop a fast approximation algorithm called rounded dynamic programming (RDP) for stochastic network design problems on directed trees. The underlying model describes phenomena that spread away from the root of a tree, for example, the spread of influence in a hierarchical organization or fish in a river network. Actions can be taken to intervene in the network---for some cost---to increase the probability of propagation along an edge. Our algorithm selects a set of actions to maximize the overall spread in the network under a limited budget. We prove that the algorithm is a fully polynomial-time approximation scheme (FPTAS), that is, it finds (1??)-optimal solutions in time polynomial in the input size and 1/?. We apply the algorithm to the problem of allocating funds efficiently to remove barriers in a river network so fish can reach greater portions of their native range. Our experiments show that the algorithm is able to produce near-optimal solutions much faster than an existing technique.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
We develop a fast approximation algorithm called rounded dynamic programming (RDP) for stochastic network design problems on directed trees. The underlying model describes phenomena that spread away from the root of a tree, for example, the spread of influence in a hierarchical organization or fish in a river network. Actions can be taken to intervene in the network---for some cost---to increase the probability of propagation along an edge. Our algorithm selects a set of actions to maximize the overall spread in the network under a limited budget. We prove that the algorithm is a fully polynomial-time approximation scheme (FPTAS), that is, it finds (1??)-optimal solutions in time polynomial in the input size and 1/?. We apply the algorithm to the problem of allocating funds efficiently to remove barriers in a river network so fish can reach greater portions of their native range. Our experiments show that the algorithm is able to produce near-optimal solutions much faster than an existing technique. |
Xiaojian Wu; Daniel Sheldon; Shlomo Zilberstein Stochastic Network Design in Bidirected Trees Conference Proceedings of the 28th Neural Information Processing Systems Conference (NIPS), Montreal, Canada, 2014. @conference{SZ:WSZnips14,
title = {Stochastic Network Design in Bidirected Trees},
author = {Xiaojian Wu and Daniel Sheldon and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/WSZnips14.pdf},
year = {2014},
date = {2014-01-01},
booktitle = {Proceedings of the 28th Neural Information Processing Systems Conference (NIPS)},
pages = {882--890},
address = {Montreal, Canada},
abstract = {We investigate the problem of stochastic network design in bidirected trees. In this problem, an underlying phenomenon (e.g., a behavior, rumor, or disease) starts at multiple sources in a tree and spreads in both directions along its edges. Actions can be taken to increase the probability of propagation on edges, and the goal is to maximize the total amount of spread away from all sources. Our main result is a rounded dynamic programming approach that leads to a fully polynomial-time approximation scheme (FPTAS), that is, an algorithm that can find (1??)-optimal solutions for any problem instance in time polynomial in the input size and 1/?. Our algorithm outperforms competing approaches on a motivating problem from computational sustainability to remove barriers in river networks to restore the health of aquatic ecosystems.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
We investigate the problem of stochastic network design in bidirected trees. In this problem, an underlying phenomenon (e.g., a behavior, rumor, or disease) starts at multiple sources in a tree and spreads in both directions along its edges. Actions can be taken to increase the probability of propagation on edges, and the goal is to maximize the total amount of spread away from all sources. Our main result is a rounded dynamic programming approach that leads to a fully polynomial-time approximation scheme (FPTAS), that is, an algorithm that can find (1??)-optimal solutions for any problem instance in time polynomial in the input size and 1/?. Our algorithm outperforms competing approaches on a motivating problem from computational sustainability to remove barriers in river networks to restore the health of aquatic ecosystems. |
Luis Pineda; Shlomo Zilberstein Realtime Concurrent Planning and Plan Execution in Stochastic Domains Technical Report School of Computer Science, University of Massachussetts Amherst no. 2014-21, 2014. @techreport{SZ:PZtr1421,
title = {Realtime Concurrent Planning and Plan Execution in Stochastic Domains},
author = {Luis Pineda and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/PZtr1421.pdf},
year = {2014},
date = {2014-01-01},
number = {2014-21},
institution = {School of Computer Science, University of Massachussetts Amherst},
abstract = {In realtime planning domains, such as service robot control, an agent receives a task and must minimize the combined cost of planning and plan execution necessary to complete the task. To reduce the total cost, we examine the feasibility of performing planning continuously, while parts of the intermediate plan are being executed. The main challenges are to guarantee the completeness of the approach and make sure that planning does concentrate on regions of the state space that are most crucial given the state of execution. Surprisingly, simple modifications of existing stochastic planners yield an efficient approach for concurrent planning and plan execution. We formalize this approach and analyze its characteristics. Experimental results show that such a continuous planning paradigm offers significant benefits, most notably a significant cost reduction relative to existing realtime planning and execution strategies.},
keywords = {},
pubstate = {published},
tppubtype = {techreport}
}
In realtime planning domains, such as service robot control, an agent receives a task and must minimize the combined cost of planning and plan execution necessary to complete the task. To reduce the total cost, we examine the feasibility of performing planning continuously, while parts of the intermediate plan are being executed. The main challenges are to guarantee the completeness of the approach and make sure that planning does concentrate on regions of the state space that are most crucial given the state of execution. Surprisingly, simple modifications of existing stochastic planners yield an efficient approach for concurrent planning and plan execution. We formalize this approach and analyze its characteristics. Experimental results show that such a continuous planning paradigm offers significant benefits, most notably a significant cost reduction relative to existing realtime planning and execution strategies. |
2013
|
Ronen I Brafman; Guy Shani; Shlomo Zilberstein Qualitative Planning under Partial Observability in Multi-Agent Domains Conference Proceedings of the 27th Conference on Artificial Intelligence (AAAI), Bellevue, Washington, 2013. @conference{SZ:BSZaaai13,
title = {Qualitative Planning under Partial Observability in Multi-Agent Domains},
author = {Ronen I Brafman and Guy Shani and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/BSZaaai13.pdf},
year = {2013},
date = {2013-01-01},
booktitle = {Proceedings of the 27th Conference on Artificial Intelligence (AAAI)},
pages = {130--137},
address = {Bellevue, Washington},
abstract = {Decentralized POMDPs (Dec-POMDPs) provide a rich, attractive model for planning under uncertainty and partial observability in cooperative multi-agent domains with a growing body of research. In this paper we formulate a qualitative, propositional model for multi-agent planning under uncertainty with partial observability, which we call Qualitative Dec-POMDP (QDec-POMDP). We show that the worst-case complexity of planning in QDec-POMDPs is similar to that of Dec-POMDPs. Still, because the model is more "classical" in nature, it is more compact and easier to specify. Furthermore, it eases the adaptation of methods used in classical and contingent planning to solve problems that challenge current Dec-POMDPs solvers. In particular, in this paper we describe a method based on compilation to classical planning, which handles multi-agent planning problems significantly larger than those handled by current Dec-POMDP algorithms.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Decentralized POMDPs (Dec-POMDPs) provide a rich, attractive model for planning under uncertainty and partial observability in cooperative multi-agent domains with a growing body of research. In this paper we formulate a qualitative, propositional model for multi-agent planning under uncertainty with partial observability, which we call Qualitative Dec-POMDP (QDec-POMDP). We show that the worst-case complexity of planning in QDec-POMDPs is similar to that of Dec-POMDPs. Still, because the model is more "classical" in nature, it is more compact and easier to specify. Furthermore, it eases the adaptation of methods used in classical and contingent planning to solve problems that challenge current Dec-POMDPs solvers. In particular, in this paper we describe a method based on compilation to classical planning, which handles multi-agent planning problems significantly larger than those handled by current Dec-POMDP algorithms. |
Luis Pineda; Yi Lu; Shlomo Zilberstein; Claudia V Goldman Fault-Tolerant Planning Under Uncertainty Conference Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI), Beijing, China, 2013. @conference{SZ:PLZGijcai13,
title = {Fault-Tolerant Planning Under Uncertainty},
author = {Luis Pineda and Yi Lu and Shlomo Zilberstein and Claudia V Goldman},
url = {http://rbr.cs.umass.edu/shlomo/papers/PLZGijcai13.pdf},
year = {2013},
date = {2013-01-01},
booktitle = {Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI)},
pages = {2350--2356},
address = {Beijing, China},
abstract = {A fault represents some erroneous operation of a system that could result from an action selection error or some abnormal condition. We formally define error models that characterize the likelihood of various faults and consider the problem of fault-tolerant planning, which optimizes performance given an error model. We show that factoring the possibility of errors significantly degrades the performance of stochastic planning algorithms such as LAO*, because the number of reachable states grows dramatically. We introduce an approach to plan for a bounded number of faults and analyze its theoretical properties. When combined with a continual planning paradigm, the k-fault-tolerant planning method can produce near-optimal performance, even when the number of faults exceeds the bound. Empirical results in two challenging domains confirm the effectiveness of the approach in handling different types of runtime errors.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
A fault represents some erroneous operation of a system that could result from an action selection error or some abnormal condition. We formally define error models that characterize the likelihood of various faults and consider the problem of fault-tolerant planning, which optimizes performance given an error model. We show that factoring the possibility of errors significantly degrades the performance of stochastic planning algorithms such as LAO*, because the number of reachable states grows dramatically. We introduce an approach to plan for a bounded number of faults and analyze its theoretical properties. When combined with a continual planning paradigm, the k-fault-tolerant planning method can produce near-optimal performance, even when the number of faults exceeds the bound. Empirical results in two challenging domains confirm the effectiveness of the approach in handling different types of runtime errors. |
Feng Wu; Shlomo Zilberstein; Nicholas R Jennings Monte-Carlo Expectation Maximization for Decentralized POMDPs Conference Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI), Beijing, China, 2013. @conference{SZ:WZJijcai13,
title = {Monte-Carlo Expectation Maximization for Decentralized POMDPs},
author = {Feng Wu and Shlomo Zilberstein and Nicholas R Jennings},
url = {http://rbr.cs.umass.edu/shlomo/papers/WZJijcai13.pdf},
year = {2013},
date = {2013-01-01},
booktitle = {Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI)},
pages = {397--403},
address = {Beijing, China},
abstract = {We address two significant drawbacks of state-of-the-art solvers of decentralized POMDPs (DEC-POMDPs): the reliance on complete knowledge of the model and limited scalability as the complexity of the domain grows. We extend a recently proposed approach for solving DEC-POMDPs via a reduction to the maximum likelihood problem, which in turn can be solved using EM. We introduce a model-free version of this approach that employs Monte-Carlo EM (MCEM). While a naive implementation of MCEM is inadequate in multi-agent settings, we introduce several improvements in sampling that produce high-quality results on a variety of DEC-POMDP benchmarks, including large problems with thousands of agents.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
We address two significant drawbacks of state-of-the-art solvers of decentralized POMDPs (DEC-POMDPs): the reliance on complete knowledge of the model and limited scalability as the complexity of the domain grows. We extend a recently proposed approach for solving DEC-POMDPs via a reduction to the maximum likelihood problem, which in turn can be solved using EM. We introduce a model-free version of this approach that employs Monte-Carlo EM (MCEM). While a naive implementation of MCEM is inadequate in multi-agent settings, we introduce several improvements in sampling that produce high-quality results on a variety of DEC-POMDP benchmarks, including large problems with thousands of agents. |
Xiaojian Wu; Akshat Kumar; Daniel Sheldon; Shlomo Zilberstein Parameter Learning for Latent Network Diffusion Conference Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI), Beijing, China, 2013. @conference{SZ:WKSZijcai13,
title = {Parameter Learning for Latent Network Diffusion},
author = {Xiaojian Wu and Akshat Kumar and Daniel Sheldon and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/WKSZijcai13.pdf},
year = {2013},
date = {2013-01-01},
booktitle = {Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI)},
pages = {2923--2930},
address = {Beijing, China},
abstract = {Diffusion processes in networks are increasingly used to model dynamic phenomena such as the spread of information, wildlife, or social influence. Our work addresses the problem of learning the underlying parameters that govern such a diffusion process by observing the time at which nodes become active. A key advantage of our approach is that, unlike previous work, it can tolerate missing observations for some nodes in the diffusion process. Having incomplete observations is characteristic of offline networks used to model the spread of wildlife. We develop an EM algorithm to address parameter learning in such settings. Since both the E and M steps are computationally challenging, we employ a number of optimization methods such as nonlinear and difference-of-convex programming to address these challenges. Evaluation of the approach on the Red-cockaded Woodpecker conservation problem shows that it is highly robust and accurately learns parameters in various settings, even with more than 80% missing data.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Diffusion processes in networks are increasingly used to model dynamic phenomena such as the spread of information, wildlife, or social influence. Our work addresses the problem of learning the underlying parameters that govern such a diffusion process by observing the time at which nodes become active. A key advantage of our approach is that, unlike previous work, it can tolerate missing observations for some nodes in the diffusion process. Having incomplete observations is characteristic of offline networks used to model the spread of wildlife. We develop an EM algorithm to address parameter learning in such settings. Since both the E and M steps are computationally challenging, we employ a number of optimization methods such as nonlinear and difference-of-convex programming to address these challenges. Evaluation of the approach on the Red-cockaded Woodpecker conservation problem shows that it is highly robust and accurately learns parameters in various settings, even with more than 80% missing data. |
William Yeoh; Akshat Kumar; Shlomo Zilberstein Automated Generation of Interaction Graphs for Value-Factored Dec-POMDPs Conference Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI), Beijing, China, 2013. @conference{SZ:YKZijcai13,
title = {Automated Generation of Interaction Graphs for Value-Factored Dec-POMDPs},
author = {William Yeoh and Akshat Kumar and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/YKZijcai13.pdf},
year = {2013},
date = {2013-01-01},
booktitle = {Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI)},
pages = {411--417},
address = {Beijing, China},
abstract = {The Decentralized Partially Observable Markov Decision Process (Dec-POMDP) is a powerful model for multiagent planning under uncertainty, but its applicability is hindered by its high complexity -- solving Dec-POMDPs optimally is NEXP-hard. Recently, Kumar et al. introduced the Value Factorization (VF) framework, which exploits decomposable value functions that can be factored into subfunctions. This framework has been shown to be a generalization of several models that leverage sparse agent interactions such as TI-Dec-MDPs, ND-POMDPs and TD-POMDPs. Existing algorithms for these models assume that the interaction graph of the problem is given. In this paper, we introduce three algorithms to automatically generate interaction graphs for models within the VF framework and establish lower and upper bounds on the expected reward of an optimal joint policy. We illustrate experimentally the benefits of these techniques for sensor placement in a decentralized tracking application.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
The Decentralized Partially Observable Markov Decision Process (Dec-POMDP) is a powerful model for multiagent planning under uncertainty, but its applicability is hindered by its high complexity -- solving Dec-POMDPs optimally is NEXP-hard. Recently, Kumar et al. introduced the Value Factorization (VF) framework, which exploits decomposable value functions that can be factored into subfunctions. This framework has been shown to be a generalization of several models that leverage sparse agent interactions such as TI-Dec-MDPs, ND-POMDPs and TD-POMDPs. Existing algorithms for these models assume that the interaction graph of the problem is given. In this paper, we introduce three algorithms to automatically generate interaction graphs for models within the VF framework and establish lower and upper bounds on the expected reward of an optimal joint policy. We illustrate experimentally the benefits of these techniques for sensor placement in a decentralized tracking application. |
Xiaojian Wu; Daniel Sheldon; Shlomo Zilberstein Stochastic Network Design for River Networks Conference NIPS Workshop on Machine Learning for Sustainability, Lake Tahoe, Nevada, 2013. @conference{SZ:WSZnips13ws,
title = {Stochastic Network Design for River Networks},
author = {Xiaojian Wu and Daniel Sheldon and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/WSZnips13ws.pdf},
year = {2013},
date = {2013-01-01},
booktitle = {NIPS Workshop on Machine Learning for Sustainability},
address = {Lake Tahoe, Nevada},
abstract = {Stochastic network design techniques can be used effectively to solve a wide range of planning problems in ecological sustainability. We propose a novel approximate algorithm based on the sample average approximation (SAA) and mixed integer programming (MIP) to efficiently address the problem of using a limited budget to remove instream barriers, which prevent fish from accessing their natural habitat. In comparison with a dynamic programming (DP) benchmark algorithm, the advantage of our algorithm is the ability to produce a near optimal solution much faster, particularly when the budget is large and the DP based algorithm becomes intractable. Furthermore, while the DP based algorithm can only solve tree-structured stream networks, our algorithm is applicable to networks with a more general directed acyclic graph structure.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Stochastic network design techniques can be used effectively to solve a wide range of planning problems in ecological sustainability. We propose a novel approximate algorithm based on the sample average approximation (SAA) and mixed integer programming (MIP) to efficiently address the problem of using a limited budget to remove instream barriers, which prevent fish from accessing their natural habitat. In comparison with a dynamic programming (DP) benchmark algorithm, the advantage of our algorithm is the ability to produce a near optimal solution much faster, particularly when the budget is large and the DP based algorithm becomes intractable. Furthermore, while the DP based algorithm can only solve tree-structured stream networks, our algorithm is applicable to networks with a more general directed acyclic graph structure. |
Edmund Durfee; Shlomo Zilberstein Multiagent Planning, Control, and Execution Book Section In: Weiss, G (Ed.): Multiagent Systems, Second Edition, pp. 485–546, MIT Press, Cambridge, MA, USA, 2013. @incollection{SZ:DZmultiagent13,
title = {Multiagent Planning, Control, and Execution},
author = {Edmund Durfee and Shlomo Zilberstein},
editor = {G Weiss},
url = {https://mitpress.mit.edu/books/multiagent-systems-second-edition},
year = {2013},
date = {2013-01-01},
booktitle = {Multiagent Systems, Second Edition},
pages = {485--546},
publisher = {MIT Press},
address = {Cambridge, MA, USA},
keywords = {},
pubstate = {published},
tppubtype = {incollection}
}
|
2012
|
Siddharth Srivastava; Neil Immerman; Shlomo Zilberstein Applicability Conditions for Plans with Loops: Computability Results and Algorithms Journal Article In: Artificial Intelligence (AIJ), vol. 191, pp. 1–19, 2012. @article{SZ:SIZaij12,
title = {Applicability Conditions for Plans with Loops: Computability Results and Algorithms},
author = {Siddharth Srivastava and Neil Immerman and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/SIZaij12.pdf},
doi = {10.1016/j.artint.2012.07.005},
year = {2012},
date = {2012-01-01},
journal = {Artificial Intelligence (AIJ)},
volume = {191},
pages = {1--19},
abstract = {The utility of including loops in plans has been long recognized by the planning community. Loops in a plan help increase both its applicability and the compactness of its representation. However, progress in finding such plans has been limited largely due to lack of methods for reasoning about the correctness and safety properties of loops of actions. We present novel algorithms for determining the applicability and progress made by a general class of loops of actions. These methods can be used for directing the search for plans with loops towards greater applicability while guaranteeing termination, as well as in post-processing of computed plans to precisely characterize their applicability. Experimental results demonstrate the efficiency of these algorithms. We also discuss the factors which can make the problem of determining applicability conditions for plans with loops incomputable.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
The utility of including loops in plans has been long recognized by the planning community. Loops in a plan help increase both its applicability and the compactness of its representation. However, progress in finding such plans has been limited largely due to lack of methods for reasoning about the correctness and safety properties of loops of actions. We present novel algorithms for determining the applicability and progress made by a general class of loops of actions. These methods can be used for directing the search for plans with loops towards greater applicability while guaranteeing termination, as well as in post-processing of computed plans to precisely characterize their applicability. Experimental results demonstrate the efficiency of these algorithms. We also discuss the factors which can make the problem of determining applicability conditions for plans with loops incomputable. |
Akshat Kumar; Shlomo Zilberstein; Marc Toussaint Message-Passing Algorithms for MAP Estimation Using DC Programming Conference Proceedings of the 15th International Conference on Artificial Intelligence and Statistics (AISTATS), La Palma, Canary Islands, 2012. @conference{SZ:KZTaistats12,
title = {Message-Passing Algorithms for MAP Estimation Using DC Programming},
author = {Akshat Kumar and Shlomo Zilberstein and Marc Toussaint},
url = {http://rbr.cs.umass.edu/shlomo/papers/KZTaistats12.pdf},
year = {2012},
date = {2012-01-01},
booktitle = {Proceedings of the 15th International Conference on Artificial Intelligence and Statistics (AISTATS)},
pages = {656--664},
address = {La Palma, Canary Islands},
abstract = {We address the problem of finding the most likely assignment or MAP estimation in a Markov random field. We analyze the linear programming formulation of MAP through the lens of difference of convex functions (DC) programming, and use the concave-convex procedure (CCCP) to develop efficient message-passing solvers. The resulting algorithms are guaranteed to converge to a global optimum of the well-studied local polytope, an outer bound on the MAP marginal polytope. To tighten the outer bound, we show how to combine it with the mean-field based inner bound and, again, solve it using CCCP. We also identify a useful relationship between the DC formulations and some recently proposed algorithms based on Bregman divergence. Experimentally, this hybrid approach produces optimal solutions for a range of hard OR problems and nearoptimal solutions for standard benchmarks.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
We address the problem of finding the most likely assignment or MAP estimation in a Markov random field. We analyze the linear programming formulation of MAP through the lens of difference of convex functions (DC) programming, and use the concave-convex procedure (CCCP) to develop efficient message-passing solvers. The resulting algorithms are guaranteed to converge to a global optimum of the well-studied local polytope, an outer bound on the MAP marginal polytope. To tighten the outer bound, we show how to combine it with the mean-field based inner bound and, again, solve it using CCCP. We also identify a useful relationship between the DC formulations and some recently proposed algorithms based on Bregman divergence. Experimentally, this hybrid approach produces optimal solutions for a range of hard OR problems and nearoptimal solutions for standard benchmarks. |
Akshat Kumar; Xiaojian Wu; Shlomo Zilberstein Lagrangian Relaxation Techniques for Scalable Spatial Conservation Planning Conference Proceedings of the 26th Conference on Artificial Intelligence (AAAI), Toronto, Canada, 2012. @conference{SZ:KWZaaai12,
title = {Lagrangian Relaxation Techniques for Scalable Spatial Conservation Planning},
author = {Akshat Kumar and Xiaojian Wu and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/KWZaaai12.pdf},
year = {2012},
date = {2012-01-01},
booktitle = {Proceedings of the 26th Conference on Artificial Intelligence (AAAI)},
pages = {309--315},
address = {Toronto, Canada},
abstract = {We address the problem of spatial conservation planning in which the goal is to maximize the expected spread of cascades of an endangered species by strategically purchasing land parcels within a given budget. This problem can be solved by standard integer programming methods using the sample average approximation (SAA) scheme. Our main contribution lies in exploiting the separable structure present in this problem and using Lagrangian relaxation techniques to gain scalability over the flat representation. We also generalize the approach to allow the application of the SAA scheme to a range of stochastic optimization problems. Our iterative approach is highly efficient in terms of space requirements and it provides an upper bound over the optimal solution at each iteration. We apply our approach to the Red-cockaded Woodpecker conservation problem. The results show that it can find the optimal solution significantly faster -- sometimes by an order-of-magnitude -- than using the flat representation for a range of budget sizes.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
We address the problem of spatial conservation planning in which the goal is to maximize the expected spread of cascades of an endangered species by strategically purchasing land parcels within a given budget. This problem can be solved by standard integer programming methods using the sample average approximation (SAA) scheme. Our main contribution lies in exploiting the separable structure present in this problem and using Lagrangian relaxation techniques to gain scalability over the flat representation. We also generalize the approach to allow the application of the SAA scheme to a range of stochastic optimization problems. Our iterative approach is highly efficient in terms of space requirements and it provides an upper bound over the optimal solution at each iteration. We apply our approach to the Red-cockaded Woodpecker conservation problem. The results show that it can find the optimal solution significantly faster -- sometimes by an order-of-magnitude -- than using the flat representation for a range of budget sizes. |
Marek Petrik; Shlomo Zilberstein Learning Feature-Based Heuristic Functions Book Section In: Hamadi, Youssef; Monfroy, Eric; Saubion, Frederic (Ed.): Autonomous Search, pp. 269–305, Springer, Berlin, Heidelberg, 2012. @incollection{SZ:PZautonomous12,
title = {Learning Feature-Based Heuristic Functions},
author = {Marek Petrik and Shlomo Zilberstein},
editor = {Youssef Hamadi and Eric Monfroy and Frederic Saubion},
url = {https://doi.org/10.1007/978-3-642-21434-9_11},
doi = {10.1007/978-3-642-21434-9_11},
year = {2012},
date = {2012-01-01},
booktitle = {Autonomous Search},
pages = {269--305},
publisher = {Springer},
address = {Berlin, Heidelberg},
abstract = {Planning is the process of creating a sequence of actions that achieve some desired goals. Automated planning arguably plays a key role in both developing intelligent systems and solving many practical industrial problems. Typical planning problems are characterized by a structured state space, a set of possible actions, a description of the effects of each action, and an objective measure. In this chapter, we consider planning as an optimization problem, seeking plans that minimize the cost of reaching the goals or some other performance measure.},
keywords = {},
pubstate = {published},
tppubtype = {incollection}
}
Planning is the process of creating a sequence of actions that achieve some desired goals. Automated planning arguably plays a key role in both developing intelligent systems and solving many practical industrial problems. Typical planning problems are characterized by a structured state space, a set of possible actions, a description of the effects of each action, and an objective measure. In this chapter, we consider planning as an optimization problem, seeking plans that minimize the cost of reaching the goals or some other performance measure. |
2011
|
Siddharth Srivastava; Neil Immerman; Shlomo Zilberstein A New Representation and Associated Algorithms for Generalized Planning Journal Article In: Artificial Intelligence (AIJ), vol. 175, no. 2, pp. 615–647, 2011. @article{SZ:SIZaij11,
title = {A New Representation and Associated Algorithms for Generalized Planning},
author = {Siddharth Srivastava and Neil Immerman and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/SIZaij11.pdf},
doi = {10.1016/j.artint.2010.10.006},
year = {2011},
date = {2011-01-01},
journal = {Artificial Intelligence (AIJ)},
volume = {175},
number = {2},
pages = {615--647},
abstract = {Constructing plans that can handle multiple problem instances is a longstanding open problem in AI. We present a framework for generalized planning that captures the notion of algorithm-like plans and unifies various approaches developed for addressing this problem. Using this framework, and building on the TVLA system for static analysis of programs, we develop a novel approach for computing generalizations of classical plans by identifying sequences of actions that will make measurable progress when placed in a loop. In a wide class of problems that we characterize formally in the paper, these methods allow us to find generalized plans with loops for solving problem instances of unbounded sizes and also to determine the correctness and applicability of the computed generalized plans. We demonstrate the scope and scalability of the proposed approach on a wide range of planning problems.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Constructing plans that can handle multiple problem instances is a longstanding open problem in AI. We present a framework for generalized planning that captures the notion of algorithm-like plans and unifies various approaches developed for addressing this problem. Using this framework, and building on the TVLA system for static analysis of programs, we develop a novel approach for computing generalizations of classical plans by identifying sequences of actions that will make measurable progress when placed in a loop. In a wide class of problems that we characterize formally in the paper, these methods allow us to find generalized plans with loops for solving problem instances of unbounded sizes and also to determine the correctness and applicability of the computed generalized plans. We demonstrate the scope and scalability of the proposed approach on a wide range of planning problems. |
Feng Wu; Shlomo Zilberstein; Xiaoping Chen Online Planning for Multi-Agent Systems with Bounded Communication Journal Article In: Artificial Intelligence (AIJ), vol. 175, no. 2, pp. 487–511, 2011. @article{SZ:WZCaij11,
title = {Online Planning for Multi-Agent Systems with Bounded Communication},
author = {Feng Wu and Shlomo Zilberstein and Xiaoping Chen},
url = {http://rbr.cs.umass.edu/shlomo/papers/WZCaij11.pdf},
doi = {10.1016/j.artint.2010.09.008},
year = {2011},
date = {2011-01-01},
journal = {Artificial Intelligence (AIJ)},
volume = {175},
number = {2},
pages = {487--511},
abstract = {We propose an online algorithm for planning under uncertainty in multi-agent settings modeled as DEC-POMDPs. The algorithm helps overcome the high computational complexity of solving such problems offline. The key challenges in decentralized operation are to maintain coordinated behavior with little or no communication and, when communication is allowed, to optimize value with minimal communication. The algorithm addresses these challenges by generating identical conditional plans based on common knowledge and communicating only when history inconsistency is detected, allowing communication to be postponed when necessary. To be suitable for online operation, the algorithm computes good local policies using a new and fast local search method implemented using linear programming. Moreover, it bounds the amount of memory used at each step and can be applied to problems with arbitrary horizons. The experimental results confirm that the algorithm can solve problems that are too large for the best existing offline planning algorithms and it outperforms the best online method, producing much higher value with much less communication in most cases. The algorithm also proves to be effective when the communication channel is imperfect (periodically unavailable). These results contribute to the scalability of decision-theoretic planning in multi-agent settings.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
We propose an online algorithm for planning under uncertainty in multi-agent settings modeled as DEC-POMDPs. The algorithm helps overcome the high computational complexity of solving such problems offline. The key challenges in decentralized operation are to maintain coordinated behavior with little or no communication and, when communication is allowed, to optimize value with minimal communication. The algorithm addresses these challenges by generating identical conditional plans based on common knowledge and communicating only when history inconsistency is detected, allowing communication to be postponed when necessary. To be suitable for online operation, the algorithm computes good local policies using a new and fast local search method implemented using linear programming. Moreover, it bounds the amount of memory used at each step and can be applied to problems with arbitrary horizons. The experimental results confirm that the algorithm can solve problems that are too large for the best existing offline planning algorithms and it outperforms the best online method, producing much higher value with much less communication in most cases. The algorithm also proves to be effective when the communication channel is imperfect (periodically unavailable). These results contribute to the scalability of decision-theoretic planning in multi-agent settings. |
Marek Petrik; Shlomo Zilberstein Robust Approximate Bilinear Programming for Value Function Approximation Journal Article In: Journal of Machine Learning Research (JMLR), vol. 12, pp. 3027–3063, 2011. @article{SZ:PZjmlr11,
title = {Robust Approximate Bilinear Programming for Value Function Approximation},
author = {Marek Petrik and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/PZjmlr11.pdf},
year = {2011},
date = {2011-01-01},
journal = {Journal of Machine Learning Research (JMLR)},
volume = {12},
pages = {3027--3063},
abstract = {Value function approximation methods have been successfully used in many applications, but the prevailing techniques often lack useful a priori error bounds. We propose a new approximate bilinear programming formulation of value function approximation, which employs global optimization. The formulation provides strong a priori guarantees on both robust and expected policy loss by minimizing specific norms of the Bellman residual. Solving a bilinear program optimally is NP-hard, but this worst-case complexity is unavoidable because the Bellman-residual minimization itself is NP-hard. We describe and analyze the formulation as well as a simple approximate algorithm for solving bilinear programs. The analysis shows that this algorithm offers a convergent generalization of approximate policy iteration. We also briefly analyze the behavior of bilinear programming algorithms under incomplete samples. Finally, we demonstrate that the proposed approach can consistently minimize the Bellman residual on simple benchmark problems.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Value function approximation methods have been successfully used in many applications, but the prevailing techniques often lack useful a priori error bounds. We propose a new approximate bilinear programming formulation of value function approximation, which employs global optimization. The formulation provides strong a priori guarantees on both robust and expected policy loss by minimizing specific norms of the Bellman residual. Solving a bilinear program optimally is NP-hard, but this worst-case complexity is unavoidable because the Bellman-residual minimization itself is NP-hard. We describe and analyze the formulation as well as a simple approximate algorithm for solving bilinear programs. The analysis shows that this algorithm offers a convergent generalization of approximate policy iteration. We also briefly analyze the behavior of bilinear programming algorithms under incomplete samples. Finally, we demonstrate that the proposed approach can consistently minimize the Bellman residual on simple benchmark problems. |
Alan Carlin; Shlomo Zilberstein Decentralized Monitoring of Distributed Anytime Algorithms Conference Proceedings of the 10th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Taipei, Taiwan, 2011. @conference{SZ:CZaamas11,
title = {Decentralized Monitoring of Distributed Anytime Algorithms},
author = {Alan Carlin and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/CZaamas11.pdf},
year = {2011},
date = {2011-01-01},
booktitle = {Proceedings of the 10th International Conference on Autonomous Agents and Multiagent Systems (AAMAS)},
pages = {157--164},
address = {Taipei, Taiwan},
abstract = {Anytime algorithms allow a system to trade solution quality for computation time. In previous work, monitoring techniques have been developed to allow agents to stop the computation at the "right" time so as to optimize a given time-dependent utility function. However, these results apply only to the single-agent case. In this paper we analyze the problems that arise when several agents solve components of a larger problem, each using an anytime algorithm. Monitoring in this case is more challenging as each agent is uncertain about the progress made so far by the others. We develop a formal framework for decentralized monitoring, establish the complexity of several interesting variants of the problem, and propose solution techniques for each one. Finally, we show that the framework can be applied to decentralized flow and planning problems.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Anytime algorithms allow a system to trade solution quality for computation time. In previous work, monitoring techniques have been developed to allow agents to stop the computation at the "right" time so as to optimize a given time-dependent utility function. However, these results apply only to the single-agent case. In this paper we analyze the problems that arise when several agents solve components of a larger problem, each using an anytime algorithm. Monitoring in this case is more challenging as each agent is uncertain about the progress made so far by the others. We develop a formal framework for decentralized monitoring, establish the complexity of several interesting variants of the problem, and propose solution techniques for each one. Finally, we show that the framework can be applied to decentralized flow and planning problems. |
Akshat Kumar; Shlomo Zilberstein Message-Passing Algorithms for Large Structured Decentralized POMDPs Conference Proceedings of the 10th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Taipei, Taiwan, 2011. @conference{SZ:KZaamas11,
title = {Message-Passing Algorithms for Large Structured Decentralized POMDPs},
author = {Akshat Kumar and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/KZaamas11.pdf},
year = {2011},
date = {2011-01-01},
booktitle = {Proceedings of the 10th International Conference on Autonomous Agents and Multiagent Systems (AAMAS)},
pages = {1087--1088},
address = {Taipei, Taiwan},
abstract = {Anytime algorithms allow a system to trade solution quality for computation time. In previous work, monitoring techniques have been developed to allow agents to stop the computation at the "right" time so as to optimize a given time-dependent utility function. However, these results apply only to the single-agent case. In this paper we analyze the problems that arise when several agents solve components of a larger problem, each using an anytime algorithm. Monitoring in this case is more challenging as each agent is uncertain about the progress made so far by the others. We develop a formal framework for decentralized monitoring, establish the complexity of several interesting variants of the problem, and propose solution techniques for each one. Finally, we show that the framework can be applied to decentralized flow and planning problems.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Anytime algorithms allow a system to trade solution quality for computation time. In previous work, monitoring techniques have been developed to allow agents to stop the computation at the "right" time so as to optimize a given time-dependent utility function. However, these results apply only to the single-agent case. In this paper we analyze the problems that arise when several agents solve components of a larger problem, each using an anytime algorithm. Monitoring in this case is more challenging as each agent is uncertain about the progress made so far by the others. We develop a formal framework for decentralized monitoring, establish the complexity of several interesting variants of the problem, and propose solution techniques for each one. Finally, we show that the framework can be applied to decentralized flow and planning problems. |
Siddharth Srivastava; Neil Immerman; Shlomo Zilberstein; Tianjiao Zhang Directed Search for Generalized Plans Using Classical Planners Conference Proceedings of the 21st International Conference on Automated Planning and Scheduling (ICAPS), Freiburg, Germany, 2011. @conference{SZ:SIZZicaps11,
title = {Directed Search for Generalized Plans Using Classical Planners},
author = {Siddharth Srivastava and Neil Immerman and Shlomo Zilberstein and Tianjiao Zhang},
url = {http://rbr.cs.umass.edu/shlomo/papers/SIZZicaps11.pdf},
year = {2011},
date = {2011-01-01},
booktitle = {Proceedings of the 21st International Conference on Automated Planning and Scheduling (ICAPS)},
pages = {226--233},
address = {Freiburg, Germany},
abstract = {We consider the problem of finding generalized plans for situations where the number of objects may be unknown and unbounded during planning. The input is a domain specification, a goal condition, and a class of concrete problem instances or initial states to be solved, expressed in an abstract first-order representation. Starting with an empty generalized plan, our overall approach is to incrementally increase the applicability of the plan by identifying a problem instance that it cannot solve, invoking a classical planner to solve that problem, generalizing the obtained solution and merging it back into the generalized plan. The main contributions of this paper are methods for (a) generating and solving small problem instances not yet covered by an existing generalized plan, (b) translating between concrete classical plans and abstract plan representations, and (c) extending partial generalized plans and increasing their applicability. We analyze the theoretical properties of these methods, prove their correctness, and illustrate experimentally their scalability. The resulting hybrid approach shows that solving only a few, small, classical planning problems can be sufficient to produce a generalized plan that applies to infinitely many problems with unknown numbers of objects.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
We consider the problem of finding generalized plans for situations where the number of objects may be unknown and unbounded during planning. The input is a domain specification, a goal condition, and a class of concrete problem instances or initial states to be solved, expressed in an abstract first-order representation. Starting with an empty generalized plan, our overall approach is to incrementally increase the applicability of the plan by identifying a problem instance that it cannot solve, invoking a classical planner to solve that problem, generalizing the obtained solution and merging it back into the generalized plan. The main contributions of this paper are methods for (a) generating and solving small problem instances not yet covered by an existing generalized plan, (b) translating between concrete classical plans and abstract plan representations, and (c) extending partial generalized plans and increasing their applicability. We analyze the theoretical properties of these methods, prove their correctness, and illustrate experimentally their scalability. The resulting hybrid approach shows that solving only a few, small, classical planning problems can be sufficient to produce a generalized plan that applies to infinitely many problems with unknown numbers of objects. |
Akshat Kumar; Shlomo Zilberstein Message-Passing Algorithms for Quadratic Programming Formulations of MAP Estimation Conference Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence (UAI), Barcelona, Spain, 2011. @conference{SZ:KZuai11,
title = {Message-Passing Algorithms for Quadratic Programming Formulations of MAP Estimation},
author = {Akshat Kumar and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/KZuai11.pdf},
year = {2011},
date = {2011-01-01},
booktitle = {Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence (UAI)},
pages = {428--435},
address = {Barcelona, Spain},
abstract = {Computing maximum a posteriori (MAP) estimation in graphical models is an important inference problem with many applications. We present message-passing algorithms for quadratic programming (QP) formulations of MAP estimation for pairwise Markov random fields. In particular, we use the concave-convex procedure (CCCP) to obtain a locally optimal algorithm for the non-convex QP formulation. A similar technique is used to derive a globally convergent algorithm for the convex QP relaxation of MAP. We also show that a recently developed expectation-maximization (EM) algorithm for the QP formulation of MAP can be derived from the CCCP perspective. Experiments on synthetic and real-world problems confirm that our new approach is competitive with max-product and its variations. Compared with CPLEX, we achieve more than an order-of-magnitude speedup in solving optimally the convex QP relaxation.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Computing maximum a posteriori (MAP) estimation in graphical models is an important inference problem with many applications. We present message-passing algorithms for quadratic programming (QP) formulations of MAP estimation for pairwise Markov random fields. In particular, we use the concave-convex procedure (CCCP) to obtain a locally optimal algorithm for the non-convex QP formulation. A similar technique is used to derive a globally convergent algorithm for the convex QP relaxation of MAP. We also show that a recently developed expectation-maximization (EM) algorithm for the QP formulation of MAP can be derived from the CCCP perspective. Experiments on synthetic and real-world problems confirm that our new approach is competitive with max-product and its variations. Compared with CPLEX, we achieve more than an order-of-magnitude speedup in solving optimally the convex QP relaxation. |
Akshat Kumar; Shlomo Zilberstein; Marc Toussaint Scalable Multiagent Planning Using Probabilistic Inference Conference Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI), Barcelona, Spain, 2011. @conference{SZ:KZTijcai11,
title = {Scalable Multiagent Planning Using Probabilistic Inference},
author = {Akshat Kumar and Shlomo Zilberstein and Marc Toussaint},
url = {http://rbr.cs.umass.edu/shlomo/papers/KZTijcai11.pdf},
doi = {10.5591/978-1-57735-516-8/IJCAI11-357},
year = {2011},
date = {2011-01-01},
booktitle = {Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI)},
pages = {2140--2146},
address = {Barcelona, Spain},
abstract = {Multiagent planning has seen much progress with the development of formal models such as Dec-POMDPs. However, the complexity of these models -- NEXP-Complete even for two agents -- has limited scalability. We identify certain mild conditions that are sufficient to make multiagent planning amenable to a scalable approximation w.r.t. the number of agents. This is achieved by constructing a graphical model in which likelihood maximization is equivalent to plan optimization. Using the Expectation-Maximization framework for likelihood maximization, we show that the necessary inference can be decomposed into processes that often involve a small subset of agents, thereby facilitating scalability. We derive a global update rule that combines these local inferences to monotonically increase the overall solution quality. Experiments on a large multiagent planning benchmark confirm the benefits of the new approach in terms of runtime and scalability.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Multiagent planning has seen much progress with the development of formal models such as Dec-POMDPs. However, the complexity of these models -- NEXP-Complete even for two agents -- has limited scalability. We identify certain mild conditions that are sufficient to make multiagent planning amenable to a scalable approximation w.r.t. the number of agents. This is achieved by constructing a graphical model in which likelihood maximization is equivalent to plan optimization. Using the Expectation-Maximization framework for likelihood maximization, we show that the necessary inference can be decomposed into processes that often involve a small subset of agents, thereby facilitating scalability. We derive a global update rule that combines these local inferences to monotonically increase the overall solution quality. Experiments on a large multiagent planning benchmark confirm the benefits of the new approach in terms of runtime and scalability. |
Feng Wu; Shlomo Zilberstein; Xiaoping Chen Online Planning for Ad Hoc Autonomous Agent Teams Conference Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI), Barcelona, Spain, 2011. @conference{SZ:WZCijcai11,
title = {Online Planning for Ad Hoc Autonomous Agent Teams},
author = {Feng Wu and Shlomo Zilberstein and Xiaoping Chen},
url = {http://rbr.cs.umass.edu/shlomo/papers/WZCijcai11.pdf},
doi = {10.5591/978-1-57735-516-8/IJCAI11-081},
year = {2011},
date = {2011-01-01},
booktitle = {Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI)},
pages = {439--445},
address = {Barcelona, Spain},
abstract = {We propose a novel online planning algorithm for ad hoc team settings -- challenging situations in which an agent must collaborate with unknown teammates without prior coordination. Our approach is based on constructing and solving a series of stage games, and then using biased adaptive play to choose actions. The utility function in each stage game is estimated via Monte-Carlo tree search using the UCT algorithm. We establish analytically the convergence of the algorithm and show that it performs well in a variety of ad hoc team domains.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
We propose a novel online planning algorithm for ad hoc team settings -- challenging situations in which an agent must collaborate with unknown teammates without prior coordination. Our approach is based on constructing and solving a series of stage games, and then using biased adaptive play to choose actions. The utility function in each stage game is estimated via Monte-Carlo tree search using the UCT algorithm. We establish analytically the convergence of the algorithm and show that it performs well in a variety of ad hoc team domains. |
Akshat Kumar; Shlomo Zilberstein On Message-Passing, MAP Estimation in Graphical Models and DCOPs Conference International Workshop on Distributed Constraint Reasoning (DCR), Barcelona, Spain, 2011. @conference{SZ:KYZdcr11,
title = {On Message-Passing, MAP Estimation in Graphical Models and DCOPs},
author = {Akshat Kumar and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/KYZdcr11.pdf},
year = {2011},
date = {2011-01-01},
booktitle = {International Workshop on Distributed Constraint Reasoning (DCR)},
pages = {57--70},
address = {Barcelona, Spain},
abstract = {The maximum a posteriori (MAP) estimation problem in graphical models is a problem common in many applications such as computer vision and bioinformatics. For example, they are used to identify the most likely orientation of proteins in protein design problems. As such, researchers in the machine learning community have developed a variety of approximate algorithms to solve them. On the other hand, distributed constraint optimization problems (DCOPs) are well-suited for modeling many multi-agent coordination problems such as the coordination of sensors in a network and the coordination of power plants. In this paper, we show that MAP estimation problems and DCOPs bear strong similarities and, as such, some approximate MAP algorithms such as iterative message passing algorithms can be easily tailored to solve DCOPs as well.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
The maximum a posteriori (MAP) estimation problem in graphical models is a problem common in many applications such as computer vision and bioinformatics. For example, they are used to identify the most likely orientation of proteins in protein design problems. As such, researchers in the machine learning community have developed a variety of approximate algorithms to solve them. On the other hand, distributed constraint optimization problems (DCOPs) are well-suited for modeling many multi-agent coordination problems such as the coordination of sensors in a network and the coordination of power plants. In this paper, we show that MAP estimation problems and DCOPs bear strong similarities and, as such, some approximate MAP algorithms such as iterative message passing algorithms can be easily tailored to solve DCOPs as well. |
Marek Petrik; Shlomo Zilberstein Linear Dynamic Programs for Resource Management Conference Proceedings of the 25th Conference on Artificial Intelligence (AAAI), San Francisco, California, 2011. @conference{SZ:PZaaai11,
title = {Linear Dynamic Programs for Resource Management},
author = {Marek Petrik and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/PZaaai11.pdf},
year = {2011},
date = {2011-01-01},
booktitle = {Proceedings of the 25th Conference on Artificial Intelligence (AAAI)},
pages = {1377--1383},
address = {San Francisco, California},
abstract = {Sustainable resource management in many domains presents large continuous stochastic optimization prob- lems, which can often be modeled as Markov decision processes (MDPs). To solve such large MDPs, we identify and leverage linearity in state and action sets that is common in resource management. In particular, we introduce linear dynamic programs(LDPs) that generalize resource management problems and partially observable MDPs (POMDPs). We show that the LDP framework makes it possible to adapt point-based methods -- the state of the art in solving POMDPs -- to solving LDPs. The experimental results demonstrate the efficiency of this approach in managing the water level of a river reservoir. Finally, we discuss the relationship with dual dynamic programming, a method used to optimize hydroelectric systems.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Sustainable resource management in many domains presents large continuous stochastic optimization prob- lems, which can often be modeled as Markov decision processes (MDPs). To solve such large MDPs, we identify and leverage linearity in state and action sets that is common in resource management. In particular, we introduce linear dynamic programs(LDPs) that generalize resource management problems and partially observable MDPs (POMDPs). We show that the LDP framework makes it possible to adapt point-based methods -- the state of the art in solving POMDPs -- to solving LDPs. The experimental results demonstrate the efficiency of this approach in managing the water level of a river reservoir. Finally, we discuss the relationship with dual dynamic programming, a method used to optimize hydroelectric systems. |
Siddharth Srivastava; Shlomo Zilberstein; Neil Immerman; Hector Geffner Qualitative Numeric Planning Conference Proceedings of the 25th Conference on Artificial Intelligence (AAAI), San Francisco, California, 2011. @conference{SZ:SZIGaaai11,
title = {Qualitative Numeric Planning},
author = {Siddharth Srivastava and Shlomo Zilberstein and Neil Immerman and Hector Geffner},
url = {http://rbr.cs.umass.edu/shlomo/papers/SZIGaaai11.pdf},
year = {2011},
date = {2011-01-01},
booktitle = {Proceedings of the 25th Conference on Artificial Intelligence (AAAI)},
pages = {1010--1016},
address = {San Francisco, California},
abstract = {We consider a new class of planning problems involving a set of non-negative real variables, and a set of non-deterministic actions that increase or decrease the values of these variables by some arbitrary amount. The formulas specifying the initial state, goal state, or action preconditions can only assert whether certain variables are equal to zero or not. Assuming that the state of the variables is fully observable, we obtain two results. First, the solution to the problem can be expressed as a policy mapping qualitative states into actions, where a qualitative state includes a Boolean variable for each original variable, indicating whether its value is zero or not. Second, testing whether any such policy, that may express nested loops of actions, is a solution to the problem, can be determined in time that is polynomial in the qualitative state space, which is much smaller than the original infinite state space. We also report experimental results using a simple generate-and-test planner to illustrate these findings.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
We consider a new class of planning problems involving a set of non-negative real variables, and a set of non-deterministic actions that increase or decrease the values of these variables by some arbitrary amount. The formulas specifying the initial state, goal state, or action preconditions can only assert whether certain variables are equal to zero or not. Assuming that the state of the variables is fully observable, we obtain two results. First, the solution to the problem can be expressed as a policy mapping qualitative states into actions, where a qualitative state includes a Boolean variable for each original variable, indicating whether its value is zero or not. Second, testing whether any such policy, that may express nested loops of actions, is a solution to the problem, can be determined in time that is polynomial in the qualitative state space, which is much smaller than the original infinite state space. We also report experimental results using a simple generate-and-test planner to illustrate these findings. |
Siddharth Srivastava; Neil Immerman; Shlomo Zilberstein Termination and Correctness Analysis of Cyclic Control Conference Proceedings of the 25th Conference on Artificial Intelligence (AAAI Nectar Track), San Francisco, California, 2011. @conference{SZ:SIZaaai11,
title = {Termination and Correctness Analysis of Cyclic Control},
author = {Siddharth Srivastava and Neil Immerman and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/SIZaaai11.pdf},
year = {2011},
date = {2011-01-01},
booktitle = {Proceedings of the 25th Conference on Artificial Intelligence (AAAI Nectar Track)},
pages = {1567--1570},
address = {San Francisco, California},
abstract = {We consider a new class of planning problems involving a set of non-negative real variables, and a set of non-deterministic actions that increase or decrease the values of these variables by some arbitrary amount. The formulas specifying the initial state, goal state, or action preconditions can only assert whether certain variables are equal to zero or not. Assuming that the state of the variables is fully observable, we obtain two results. First, the solution to the problem can be expressed as a policy mapping qualitative states into actions, where a qualitative state includes a Boolean variable for each original variable, indicating whether its value is zero or not. Second, testing whether any such policy, that may express nested loops of actions, is a solution to the problem, can be determined in time that is polynomial in the qualitative state space, which is much smaller than the original infinite state space. We also report experimental results using a simple generate-and-test planner to illustrate these findings.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
We consider a new class of planning problems involving a set of non-negative real variables, and a set of non-deterministic actions that increase or decrease the values of these variables by some arbitrary amount. The formulas specifying the initial state, goal state, or action preconditions can only assert whether certain variables are equal to zero or not. Assuming that the state of the variables is fully observable, we obtain two results. First, the solution to the problem can be expressed as a policy mapping qualitative states into actions, where a qualitative state includes a Boolean variable for each original variable, indicating whether its value is zero or not. Second, testing whether any such policy, that may express nested loops of actions, is a solution to the problem, can be determined in time that is polynomial in the qualitative state space, which is much smaller than the original infinite state space. We also report experimental results using a simple generate-and-test planner to illustrate these findings. |
Xiaojian Wu; Akshat Kumar; Shlomo Zilberstein Influence Diagrams with Memory States: Representation and Algorithms Conference Proceedings of the 2nd International Conference on Algorithmic Decision Theory (ADT), Piscataway, New Jersey, 2011. @conference{SZ:WKZadt11,
title = {Influence Diagrams with Memory States: Representation and Algorithms},
author = {Xiaojian Wu and Akshat Kumar and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/WKZadt11.pdf},
doi = {10.1007/978-3-642-24873-3_23},
year = {2011},
date = {2011-01-01},
booktitle = {Proceedings of the 2nd International Conference on Algorithmic Decision Theory (ADT)},
pages = {306--319},
address = {Piscataway, New Jersey},
abstract = {Influence diagrams (IDs) offer a powerful framework for decision making under uncertainty, but their applicability has been hindered by the exponential growth of runtime and memory usage--largely due to the no-forgetting assumption. We present a novel way to maintain a limited amount of memory to inform each decision and still obtain near-optimal policies. The approach is based on augmenting the graphical model with memory states that represent key aspects of previous observations--a method that has proved useful in POMDP solvers. We also derive an efficient EM-based message-passing algorithm to compute the policy. Experimental results show that this approach produces high-quality approximate polices and offers better scalability than existing methods.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Influence diagrams (IDs) offer a powerful framework for decision making under uncertainty, but their applicability has been hindered by the exponential growth of runtime and memory usage--largely due to the no-forgetting assumption. We present a novel way to maintain a limited amount of memory to inform each decision and still obtain near-optimal policies. The approach is based on augmenting the graphical model with memory states that represent key aspects of previous observations--a method that has proved useful in POMDP solvers. We also derive an efficient EM-based message-passing algorithm to compute the policy. Experimental results show that this approach produces high-quality approximate polices and offers better scalability than existing methods. |
Shlomo Zilberstein Metareasoning and Bounded Rationality Book Section In: Cox, M; Raja, A (Ed.): Metareasoning: Thinking about Thinking, pp. 27–40, MIT Press, Cambridge, MA, USA, 2011. @incollection{SZ:Zmetareasoning11,
title = {Metareasoning and Bounded Rationality},
author = {Shlomo Zilberstein},
editor = {M Cox and A Raja},
url = {http://rbr.cs.umass.edu/shlomo/papers/ZCh3-2011.pdf},
year = {2011},
date = {2011-01-01},
booktitle = {Metareasoning: Thinking about Thinking},
pages = {27--40},
publisher = {MIT Press},
address = {Cambridge, MA, USA},
keywords = {},
pubstate = {published},
tppubtype = {incollection}
}
|
Alan Carling; Shlomo Zilberstein Bounded Rationality in Multiagent Systems Using Decentralized Metareasoning Book Section In: Guy, T; Karny, M; Wolpert, D (Ed.): Decision Making with Imperfect Decision Makers, pp. 1–28, Springer, Berlin, Heidelberg, 2011. @incollection{SZ:CZdecisionmaking11,
title = {Bounded Rationality in Multiagent Systems Using Decentralized Metareasoning},
author = {Alan Carling and Shlomo Zilberstein},
editor = {T Guy and M Karny and D Wolpert},
url = {http://www.springerlink.com/content/g136745180478228/},
doi = {https://doi.org/10.1007/978-3-642-24647-0},
year = {2011},
date = {2011-01-01},
booktitle = {Decision Making with Imperfect Decision Makers},
pages = {1--28},
publisher = {Springer},
address = {Berlin, Heidelberg},
keywords = {},
pubstate = {published},
tppubtype = {incollection}
}
|
2010
|
Siddharth Srivastava; Neil Immerman; Shlomo Zilberstein Computing Applicability Conditions for Plans with Loops Conference Proceedings of the 20th International Conference on Automated Planning and Scheduling (ICAPS), Toronto, Canada, 2010, (Best Paper Award). @conference{SZ:SIZicaps10,
title = {Computing Applicability Conditions for Plans with Loops},
author = {Siddharth Srivastava and Neil Immerman and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/SIZicaps10.pdf},
year = {2010},
date = {2010-01-01},
booktitle = {Proceedings of the 20th International Conference on Automated Planning and Scheduling (ICAPS)},
pages = {161--168},
address = {Toronto, Canada},
abstract = {The utility of including loops in plans has been long recognized by the planning community. Loops in a plan help increase both its applicability and the compactness of representation. However, progress in finding such plans has been limited largely due to lack of methods for reasoning about the correctness and safety properties of loops of actions. We present novel algorithms for determining the applicability and progress made by a general class of loops of actions. These methods can be used for directing the search for plans with loops towards greater applicability while guaranteeing termination, as well as in post-processing of computed plans to precisely characterize their applicability. Experimental results demonstrate the efficiency of these algorithms.},
note = {Best Paper Award},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
The utility of including loops in plans has been long recognized by the planning community. Loops in a plan help increase both its applicability and the compactness of representation. However, progress in finding such plans has been limited largely due to lack of methods for reasoning about the correctness and safety properties of loops of actions. We present novel algorithms for determining the applicability and progress made by a general class of loops of actions. These methods can be used for directing the search for plans with loops towards greater applicability while guaranteeing termination, as well as in post-processing of computed plans to precisely characterize their applicability. Experimental results demonstrate the efficiency of these algorithms. |
Christopher Amato; Daniel S Bernstein; Shlomo Zilberstein Optimizing Fixed-Size Stochastic Controllers for POMDPs and Decentralized POMDPs Journal Article In: Autonomous Agents and Multi-Agent Systems (JAAMAS), vol. 21, no. 3, pp. 293–320, 2010. @article{SZ:ABZjaamas10,
title = {Optimizing Fixed-Size Stochastic Controllers for POMDPs and Decentralized POMDPs},
author = {Christopher Amato and Daniel S Bernstein and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/ABZjaamas10.pdf},
doi = {10.1007/s10458-009-9103-z},
year = {2010},
date = {2010-01-01},
journal = {Autonomous Agents and Multi-Agent Systems (JAAMAS)},
volume = {21},
number = {3},
pages = {293--320},
abstract = {Coordination of distributed agents is required for problems arising in many areas, including multi-robot systems, networking and e-commerce. As a formal framework for such problems, we use the decentralized partially observable Markov decision process (DEC-POMDP). Though much work has been done on optimal dynamic programming algorithms for the single-agent version of the problem, optimal algorithms for the multiagent case have been elusive. The main contribution of this paper is an optimal policy iteration algorithm for solving DEC-POMDPs. The algorithm uses stochastic finite-state controllers to represent policies. The solution can include a correlation device, which allows agents to correlate their actions without communicating. This approach alternates between expanding the controller and performing value-preserving transformations, which modify the controller without sacrificing value. We present two Efficient value-preserving transformations: one can reduce the size of the controller and the other can improve its value while keeping the size fixed. Empirical results demonstrate the usefulness of value-preserving transformations in increasing value while keeping controller size to a minimum. To broaden the applicability of the approach, we also present a heuristic version of the policy iteration algorithm, which sacrifices convergence to optimality. This algorithm further reduces the size of the controllers at each step by assuming that probability distributions over the other agents' actions are known. While this assumption may not hold in general, it helps produce higher quality solutions in our test problems.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Coordination of distributed agents is required for problems arising in many areas, including multi-robot systems, networking and e-commerce. As a formal framework for such problems, we use the decentralized partially observable Markov decision process (DEC-POMDP). Though much work has been done on optimal dynamic programming algorithms for the single-agent version of the problem, optimal algorithms for the multiagent case have been elusive. The main contribution of this paper is an optimal policy iteration algorithm for solving DEC-POMDPs. The algorithm uses stochastic finite-state controllers to represent policies. The solution can include a correlation device, which allows agents to correlate their actions without communicating. This approach alternates between expanding the controller and performing value-preserving transformations, which modify the controller without sacrificing value. We present two Efficient value-preserving transformations: one can reduce the size of the controller and the other can improve its value while keeping the size fixed. Empirical results demonstrate the usefulness of value-preserving transformations in increasing value while keeping controller size to a minimum. To broaden the applicability of the approach, we also present a heuristic version of the policy iteration algorithm, which sacrifices convergence to optimality. This algorithm further reduces the size of the controllers at each step by assuming that probability distributions over the other agents' actions are known. While this assumption may not hold in general, it helps produce higher quality solutions in our test problems. |
Akshat Kumar; Shlomo Zilberstein Point-Based Backup for Decentralized POMDPs: Complexity and New Algorithms Conference Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Toronto, Canada, 2010. @conference{SZ:KZaamas10,
title = {Point-Based Backup for Decentralized POMDPs: Complexity and New Algorithms},
author = {Akshat Kumar and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/KZaamas10.pdf},
year = {2010},
date = {2010-01-01},
booktitle = {Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS)},
pages = {1315--1322},
address = {Toronto, Canada},
abstract = {Decentralized POMDPs provide an expressive framework for sequential multi-agent decision making. Despite their high complexity, there has been significant progress in scaling up existing algorithms, largely due to the use of point-based methods. Performing point-based backup is a fundamental operation in state-of-the-art algorithms. We show that even a single backup step in the multi-agent setting is NP-Complete. Despite this negative worst-case result, we present an efficient and scalable optimal algorithm as well as a principled approximation scheme. The optimal algorithm exploits recent advances in the weighted CSP literature to overcome the complexity of the backup operation. The polytime approximation scheme provides a constant factor approximation guarantee based on the number of belief points. In experiments on standard domains, the optimal approach provides significant speedup (up to 2 orders of magnitude) over the previous best optimal algorithm and is able to increase the number of belief points by more than a factor of 3. The approximation scheme also works well in practice, providing near-optimal solutions to the backup problem.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Decentralized POMDPs provide an expressive framework for sequential multi-agent decision making. Despite their high complexity, there has been significant progress in scaling up existing algorithms, largely due to the use of point-based methods. Performing point-based backup is a fundamental operation in state-of-the-art algorithms. We show that even a single backup step in the multi-agent setting is NP-Complete. Despite this negative worst-case result, we present an efficient and scalable optimal algorithm as well as a principled approximation scheme. The optimal algorithm exploits recent advances in the weighted CSP literature to overcome the complexity of the backup operation. The polytime approximation scheme provides a constant factor approximation guarantee based on the number of belief points. In experiments on standard domains, the optimal approach provides significant speedup (up to 2 orders of magnitude) over the previous best optimal algorithm and is able to increase the number of belief points by more than a factor of 3. The approximation scheme also works well in practice, providing near-optimal solutions to the backup problem. |
Siddharth Srivastava; Neil Immerman; Shlomo Zilberstein Merging Example Plans into Generalized Plans for Non-deterministic Environments Conference Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Toronto, Canada, 2010. @conference{SIZaamas10,
title = {Merging Example Plans into Generalized Plans for Non-deterministic Environments},
author = {Siddharth Srivastava and Neil Immerman and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/SIZaamas10.pdf},
year = {2010},
date = {2010-01-01},
booktitle = {Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS)},
pages = {1341--1348},
address = {Toronto, Canada},
abstract = {We present a new approach for finding generalized contingent plans with loops and branches in situations where there is uncertainty in state properties and object quantities, but lack of probabilistic information about these uncertainties. We use a state abstraction technique from static analysis of programs, which uses 3-valued logic to compactly represent belief states with unbounded numbers of objects. Our approach for finding plans is to incrementally generalize and merge input example plans which can be generated by classical planners. The expressiveness and scope of this approach are demonstrated using experimental results on common benchmark domains.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
We present a new approach for finding generalized contingent plans with loops and branches in situations where there is uncertainty in state properties and object quantities, but lack of probabilistic information about these uncertainties. We use a state abstraction technique from static analysis of programs, which uses 3-valued logic to compactly represent belief states with unbounded numbers of objects. Our approach for finding plans is to incrementally generalize and merge input example plans which can be generated by classical planners. The expressiveness and scope of this approach are demonstrated using experimental results on common benchmark domains. |
Feng Wu; Shlomo Zilberstein; Xiaoping Chen Point-Based Policy Generation for Decentralized POMDPs Conference Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Toronto, Canada, 2010. @conference{SZ:WZCaamas10,
title = {Point-Based Policy Generation for Decentralized POMDPs},
author = {Feng Wu and Shlomo Zilberstein and Xiaoping Chen},
url = {http://rbr.cs.umass.edu/shlomo/papers/WZCaamas10.pdf},
year = {2010},
date = {2010-01-01},
booktitle = {Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS)},
pages = {1307--1314},
address = {Toronto, Canada},
abstract = {Memory-bounded techniques have shown great promise in solving complex multi-agent planning problems modeled as DEC-POMDPs. Much of the performance gains can be attributed to pruning techniques that alleviate the complexity of the exhaustive backup step of the original MBDP algorithm. Despite these improvements, state-of-the-art algorithms can still handle a relative small pool of candidate policies, which limits the quality of the solution in some benchmark problems. We present a new algorithm, Point-Based Policy Generation, which avoids altogether searching the entire joint policy space. The key observation is that the best joint policy for each reachable belief state can be constructed directly, instead of producing first a large set of candidates. We also provide an efficient approximate implementation of this operation. The experimental results show that our solution technique improves the performance significantly in terms of both runtime and solution quality.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Memory-bounded techniques have shown great promise in solving complex multi-agent planning problems modeled as DEC-POMDPs. Much of the performance gains can be attributed to pruning techniques that alleviate the complexity of the exhaustive backup step of the original MBDP algorithm. Despite these improvements, state-of-the-art algorithms can still handle a relative small pool of candidate policies, which limits the quality of the solution in some benchmark problems. We present a new algorithm, Point-Based Policy Generation, which avoids altogether searching the entire joint policy space. The key observation is that the best joint policy for each reachable belief state can be constructed directly, instead of producing first a large set of candidates. We also provide an efficient approximate implementation of this operation. The experimental results show that our solution technique improves the performance significantly in terms of both runtime and solution quality. |
Marek Petrik; Gavin Taylor; Ron Parr; Shlomo Zilberstein Feature Selection Using Regularization in Approximate Linear Programs for Markov Decision Processes Conference Proceedings of the 27th International Conference on Machine Learning (ICML), Haifa, Israel, 2010. @conference{SZ:PTPZicml10,
title = {Feature Selection Using Regularization in Approximate Linear Programs for Markov Decision Processes},
author = {Marek Petrik and Gavin Taylor and Ron Parr and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/PTPZicml10.pdf},
year = {2010},
date = {2010-01-01},
booktitle = {Proceedings of the 27th International Conference on Machine Learning (ICML)},
pages = {871--878},
address = {Haifa, Israel},
abstract = {Approximate dynamic programming has been used successfully in a large variety of domains, but it relies on a small set of provided approximation features to calculate solutions reliably. Large and rich sets of features can cause existing algorithms to overfit because of a limited number of samples. We address this shortcoming using L1 regularization in approximate linear programming. Because the proposed method can automatically select the appropriate richness of features, its performance does not degrade with an increasing number of features. These results rely on new and stronger sampling bounds for regularized approximate linear programs. We also propose a computationally efficient homotopy method. The empirical evaluation of the approach shows that the proposed method performs well on simple MDPs and standard benchmark problems.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Approximate dynamic programming has been used successfully in a large variety of domains, but it relies on a small set of provided approximation features to calculate solutions reliably. Large and rich sets of features can cause existing algorithms to overfit because of a limited number of samples. We address this shortcoming using L1 regularization in approximate linear programming. Because the proposed method can automatically select the appropriate richness of features, its performance does not degrade with an increasing number of features. These results rely on new and stronger sampling bounds for regularized approximate linear programs. We also propose a computationally efficient homotopy method. The empirical evaluation of the approach shows that the proposed method performs well on simple MDPs and standard benchmark problems. |
Akshat Kumar; Shlomo Zilberstein Anytime Planning for Decentralized POMDPs using Expectation Maximization Conference Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI), Catalina Island, California, 2010. @conference{SZ:KZuai10,
title = {Anytime Planning for Decentralized POMDPs using Expectation Maximization},
author = {Akshat Kumar and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/KZuai10.pdf},
year = {2010},
date = {2010-01-01},
booktitle = {Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI)},
pages = {294--301},
address = {Catalina Island, California},
abstract = {Decentralized POMDPs provide an expressive framework for multi-agent sequential decision making. While finite-horizon DEC-POMDPs have enjoyed significant success, progress remains slow for the infinite-horizon case mainly due to the inherent complexity of optimizing stochastic controllers representing agent policies. We present a promising new class of algorithms for the infinite-horizon case, which recasts the optimization problem as inference in a mixture of DBNs. An attractive feature of this approach is the straightforward adoption of existing inference techniques in DBNs for solving DEC-POMDPs and supporting richer representations such as factored or continuous states and actions. We also derive the Expectation Maximization (EM) algorithm to optimize the joint policy represented as DBNs. Experiments on benchmark domains show that EM compares favorably against the state-of-the-art solvers.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Decentralized POMDPs provide an expressive framework for multi-agent sequential decision making. While finite-horizon DEC-POMDPs have enjoyed significant success, progress remains slow for the infinite-horizon case mainly due to the inherent complexity of optimizing stochastic controllers representing agent policies. We present a promising new class of algorithms for the infinite-horizon case, which recasts the optimization problem as inference in a mixture of DBNs. An attractive feature of this approach is the straightforward adoption of existing inference techniques in DBNs for solving DEC-POMDPs and supporting richer representations such as factored or continuous states and actions. We also derive the Expectation Maximization (EM) algorithm to optimize the joint policy represented as DBNs. Experiments on benchmark domains show that EM compares favorably against the state-of-the-art solvers. |
Feng Wu; Shlomo Zilberstein; Xiaoping Chen Rollout Sampling Policy Iteration for Decentralized POMDPs Conference Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI), Catalina Island, California, 2010. @conference{SZ:WZCuai10,
title = {Rollout Sampling Policy Iteration for Decentralized POMDPs},
author = {Feng Wu and Shlomo Zilberstein and Xiaoping Chen},
url = {http://rbr.cs.umass.edu/shlomo/papers/WZCuai10.pdf},
year = {2010},
date = {2010-01-01},
booktitle = {Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI)},
pages = {666--673},
address = {Catalina Island, California},
abstract = {We present decentralized rollout sampling policy iteration (DecRSPI)--a new algorithm for multiagent decision problems formalized as DEC-POMDPs. DecRSPI is designed to improve scalability and tackle problems that lack an explicit model. The algorithm uses Monte-Carlo methods to generate a sample of reachable belief states. Then it computes a joint policy for each belief state based on the rollout estimations. A new policy representation allows us to represent solutions compactly. The key benefits of the algorithm are its linear time complexity over the number of agents, its bounded memory usage and good solution quality. It can solve larger problems that are intractable for existing planning algorithms. Experimental results confirm the effectiveness and scalability of the approach.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
We present decentralized rollout sampling policy iteration (DecRSPI)--a new algorithm for multiagent decision problems formalized as DEC-POMDPs. DecRSPI is designed to improve scalability and tackle problems that lack an explicit model. The algorithm uses Monte-Carlo methods to generate a sample of reachable belief states. Then it computes a joint policy for each belief state based on the rollout estimations. A new policy representation allows us to represent solutions compactly. The key benefits of the algorithm are its linear time complexity over the number of agents, its bounded memory usage and good solution quality. It can solve larger problems that are intractable for existing planning algorithms. Experimental results confirm the effectiveness and scalability of the approach. |
Abdel-Illah Mouaddib; Shlomo Zilberstein; Aurelie Beynier; Laurent Jeanpierre A Decision-Theoretic Approach to Cooperative Control and Adjustable Autonomy Conference Proceedings of the 9th European Conference on Artificial Intelligence (ECAI), Lisbon, Portugal, 2010. @conference{SZ:MZBJecai10,
title = {A Decision-Theoretic Approach to Cooperative Control and Adjustable Autonomy},
author = {Abdel-Illah Mouaddib and Shlomo Zilberstein and Aurelie Beynier and Laurent Jeanpierre},
url = {https://doi.org/10.3233/978-1-60750-606-5-971},
doi = {10.3233/978-1-60750-606-5-971},
year = {2010},
date = {2010-01-01},
booktitle = {Proceedings of the 9th European Conference on Artificial Intelligence (ECAI)},
pages = {971--972},
address = {Lisbon, Portugal},
abstract = {Cooperative control can help overcome the limitations of autonomous systems (AS) by introducing a supervision unit (SU) (human or another system) into the control loop and creating adjustable autonomy. We present a decision-theoretic approach to accomplish this using Mixed Markov Decision Processes (MI-MDPs). The solution is an optimal plan that tells the AS what actions to perform as well as when to request SU attention or transfer control to the SU. This provides a varying degree of autonomy, particularly suitable for robots exploring a domain with regions that are too complex or risky for autonomous operation, or intelligent vehicles operating in heavy traffic.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Cooperative control can help overcome the limitations of autonomous systems (AS) by introducing a supervision unit (SU) (human or another system) into the control loop and creating adjustable autonomy. We present a decision-theoretic approach to accomplish this using Mixed Markov Decision Processes (MI-MDPs). The solution is an optimal plan that tells the AS what actions to perform as well as when to request SU attention or transfer control to the SU. This provides a varying degree of autonomy, particularly suitable for robots exploring a domain with regions that are too complex or risky for autonomous operation, or intelligent vehicles operating in heavy traffic. |
Christopher Amato; Blai Bonet; Shlomo Zilberstein Finite-State Controllers Based on Mealy Machines for Centralized and Decentralized POMDPs Conference Proceedings of the 24th Conference on Artificial Intelligence (AAAI), Atlanta, Georgia, 2010. @conference{SZ:ABZaaai10,
title = {Finite-State Controllers Based on Mealy Machines for Centralized and Decentralized POMDPs},
author = {Christopher Amato and Blai Bonet and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/ABZaaai10.pdf},
year = {2010},
date = {2010-01-01},
booktitle = {Proceedings of the 24th Conference on Artificial Intelligence (AAAI)},
pages = {1052--1058},
address = {Atlanta, Georgia},
abstract = {Existing controller-based approaches for centralized and decentralized POMDPs are based on automata with output known as Moore machines. In this paper, we show that several advantages can be gained by utilizing another type of automata, the Mealy machine. Mealy machines are more powerful than Moore machines, provide a richer structure that can be exploited by solution methods, and can be easily incorporated into current controller-based approaches. To demonstrate this, we adapted some existing controller-based algorithms to use Mealy machines and obtained results on a set of benchmark domains. The Mealy-based approach always outperformed the Moore-based approach and often outperformed the state-of-the-art algorithms for both centralized and decentralized POMDPs. These findings provide fresh and general insights for the improvement of existing algorithms and the development of new ones.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Existing controller-based approaches for centralized and decentralized POMDPs are based on automata with output known as Moore machines. In this paper, we show that several advantages can be gained by utilizing another type of automata, the Mealy machine. Mealy machines are more powerful than Moore machines, provide a richer structure that can be exploited by solution methods, and can be easily incorporated into current controller-based approaches. To demonstrate this, we adapted some existing controller-based algorithms to use Mealy machines and obtained results on a set of benchmark domains. The Mealy-based approach always outperformed the Moore-based approach and often outperformed the state-of-the-art algorithms for both centralized and decentralized POMDPs. These findings provide fresh and general insights for the improvement of existing algorithms and the development of new ones. |
Feng Wu; Shlomo Zilberstein; Xiaoping Chen Trial-Based Dynamic Programming for Multi-Agent Planning Conference Proceedings of the 24th Conference on Artificial Intelligence (AAAI), Atlanta, Georgia, 2010. @conference{SZ:WZCaaai10,
title = {Trial-Based Dynamic Programming for Multi-Agent Planning},
author = {Feng Wu and Shlomo Zilberstein and Xiaoping Chen},
url = {http://rbr.cs.umass.edu/shlomo/papers/WZCaaai10.pdf},
year = {2010},
date = {2010-01-01},
booktitle = {Proceedings of the 24th Conference on Artificial Intelligence (AAAI)},
pages = {908--914},
address = {Atlanta, Georgia},
abstract = {Trial-based approaches offer an efficient way to solve single-agent MDPs and POMDPs. These approaches allow agents to focus their computations on regions of the environment they encounter during the trials, leading to significant computational savings. We present a novel trial-based dynamic programming (TBDP) algorithm for DEC-POMDPs that extends these benefits to multi-agent settings. The algorithm uses trial-based methods for both belief generation and policy evaluation. Policy improvement is implemented efficiently using linear programming and a sub-policy reuse technique that helps bound the amount of memory. The results show that TBDP can produce significant value improvements and is much faster than the best existing planning algorithms.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Trial-based approaches offer an efficient way to solve single-agent MDPs and POMDPs. These approaches allow agents to focus their computations on regions of the environment they encounter during the trials, leading to significant computational savings. We present a novel trial-based dynamic programming (TBDP) algorithm for DEC-POMDPs that extends these benefits to multi-agent settings. The algorithm uses trial-based methods for both belief generation and policy evaluation. Policy improvement is implemented efficiently using linear programming and a sub-policy reuse technique that helps bound the amount of memory. The results show that TBDP can produce significant value improvements and is much faster than the best existing planning algorithms. |
Akshat Kumar; Shlomo Zilberstein MAP Estimation for Graphical Models by Likelihood Maximization Conference Proceedings of the 24th Neural Information Processing Systems Conference (NIPS), Vancouver, British Columbia, Canada, 2010. @conference{SZ:KZnips10,
title = {MAP Estimation for Graphical Models by Likelihood Maximization},
author = {Akshat Kumar and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/KZnips10.pdf},
year = {2010},
date = {2010-01-01},
booktitle = {Proceedings of the 24th Neural Information Processing Systems Conference (NIPS)},
pages = {1180--1188},
address = {Vancouver, British Columbia, Canada},
abstract = {Computing a maximum a posteriori (MAP) assignment in graphical models is a crucial inference problem for many practical applications. Several provably convergent approaches have been successfully developed using linear programming (LP) relaxation of the MAP problem. We present an alternative approach, which transforms the MAP problem into that of inference in a mixture of simple Bayes nets. We then derive the Expectation Maximization (EM) algorithm for this mixture that also monotonically increases a lower bound on the MAP assignment until convergence. The update equations for the EM algorithm are remarkably simple, both conceptually and computationally, and can be implemented using a graph-based message passing paradigm similar to max-product computation. Experiments on the real-world protein design dataset show that EM's convergence rate is significantly higher than the previous LP relaxation based approach MPLP. EM also achieves a solution quality within 95% of optimal for most instances.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Computing a maximum a posteriori (MAP) assignment in graphical models is a crucial inference problem for many practical applications. Several provably convergent approaches have been successfully developed using linear programming (LP) relaxation of the MAP problem. We present an alternative approach, which transforms the MAP problem into that of inference in a mixture of simple Bayes nets. We then derive the Expectation Maximization (EM) algorithm for this mixture that also monotonically increases a lower bound on the MAP assignment until convergence. The update equations for the EM algorithm are remarkably simple, both conceptually and computationally, and can be implemented using a graph-based message passing paradigm similar to max-product computation. Experiments on the real-world protein design dataset show that EM's convergence rate is significantly higher than the previous LP relaxation based approach MPLP. EM also achieves a solution quality within 95% of optimal for most instances. |
2009
|
Daniel S Bernstein; Christopher Amato; Eric A Hansen; Shlomo Zilberstein Policy Iteration for Decentralized Control of Markov Decision Processes Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 34, pp. 89–132, 2009. @article{SZ:BAHZjair09,
title = {Policy Iteration for Decentralized Control of Markov Decision Processes},
author = {Daniel S Bernstein and Christopher Amato and Eric A Hansen and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/BAHZjair09.pdf},
doi = {10.1613/jair.2667},
year = {2009},
date = {2009-01-01},
journal = {Journal of Artificial Intelligence Research (JAIR)},
volume = {34},
pages = {89--132},
abstract = {Coordination of distributed agents is required for problems arising in many areas, including multi-robot systems, networking and e-commerce. As a formal framework for such problems, we use the decentralized partially observable Markov decision process (DEC-POMDP). Though much work has been done on optimal dynamic programming algorithms for the single-agent version of the problem, optimal algorithms for the multiagent case have been elusive. The main contribution of this paper is an optimal policy iteration algorithm for solving DEC-POMDPs. The algorithm uses stochastic finite-state controllers to represent policies. The solution can include a correlation device, which allows agents to correlate their actions without communicating. This approach alternates between expanding the controller and performing value-preserving transformations, which modify the controller without sacrificing value. We present two Efficient value-preserving transformations: one can reduce the size of the controller and the other can improve its value while keeping the size fixed. Empirical results demonstrate the usefulness of value-preserving transformations in increasing value while keeping controller size to a minimum. To broaden the applicability of the approach, we also present a heuristic version of the policy iteration algorithm, which sacrifices convergence to optimality. This algorithm further reduces the size of the controllers at each step by assuming that probability distributions over the other agents' actions are known. While this assumption may not hold in general, it helps produce higher quality solutions in our test problems.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Coordination of distributed agents is required for problems arising in many areas, including multi-robot systems, networking and e-commerce. As a formal framework for such problems, we use the decentralized partially observable Markov decision process (DEC-POMDP). Though much work has been done on optimal dynamic programming algorithms for the single-agent version of the problem, optimal algorithms for the multiagent case have been elusive. The main contribution of this paper is an optimal policy iteration algorithm for solving DEC-POMDPs. The algorithm uses stochastic finite-state controllers to represent policies. The solution can include a correlation device, which allows agents to correlate their actions without communicating. This approach alternates between expanding the controller and performing value-preserving transformations, which modify the controller without sacrificing value. We present two Efficient value-preserving transformations: one can reduce the size of the controller and the other can improve its value while keeping the size fixed. Empirical results demonstrate the usefulness of value-preserving transformations in increasing value while keeping controller size to a minimum. To broaden the applicability of the approach, we also present a heuristic version of the policy iteration algorithm, which sacrifices convergence to optimality. This algorithm further reduces the size of the controllers at each step by assuming that probability distributions over the other agents' actions are known. While this assumption may not hold in general, it helps produce higher quality solutions in our test problems. |
Marek Petrik; Shlomo Zilberstein A Bilinear Programming Approach for Multiagent Planning Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 35, pp. 235–274, 2009. @article{SZ:PZjair09,
title = {A Bilinear Programming Approach for Multiagent Planning},
author = {Marek Petrik and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/PZjair09.pdf},
doi = {10.1613/jair.2673},
year = {2009},
date = {2009-01-01},
journal = {Journal of Artificial Intelligence Research (JAIR)},
volume = {35},
pages = {235--274},
abstract = {Multiagent planning and coordination problems are common and known to be computationally hard. We show that a wide range of two-agent problems can be formulated as bilinear programs. We present a successive approximation algorithm that significantly outperforms the coverage set algorithm, which is the state-of-the-art method for this class of multiagent problems. Because the algorithm is formulated for bilinear programs, it is more general and simpler to implement. The new algorithm can be terminated at any time and--unlike the coverage set algorithm--it facilitates the derivation of a useful online performance bound. It is also much more efficient, on average reducing the computation time of the optimal solution by about four orders of magnitude. Finally, we introduce an automatic dimensionality reduction method that improves the effectiveness of the algorithm, extending its applicability to new domains and providing a new way to analyze a subclass of bilinear programs.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Multiagent planning and coordination problems are common and known to be computationally hard. We show that a wide range of two-agent problems can be formulated as bilinear programs. We present a successive approximation algorithm that significantly outperforms the coverage set algorithm, which is the state-of-the-art method for this class of multiagent problems. Because the algorithm is formulated for bilinear programs, it is more general and simpler to implement. The new algorithm can be terminated at any time and--unlike the coverage set algorithm--it facilitates the derivation of a useful online performance bound. It is also much more efficient, on average reducing the computation time of the optimal solution by about four orders of magnitude. Finally, we introduce an automatic dimensionality reduction method that improves the effectiveness of the algorithm, extending its applicability to new domains and providing a new way to analyze a subclass of bilinear programs. |
Raphen Becker; Alan Carlin; Victor Lesser; Shlomo Zilberstein Analyzing Myopic Approaches for Multi-Agent Communication Journal Article In: Computational Intelligence, vol. 25, no. 1, pp. 31–50, 2009. @article{SZ:BCLZci09,
title = {Analyzing Myopic Approaches for Multi-Agent Communication},
author = {Raphen Becker and Alan Carlin and Victor Lesser and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/BCLZci09.pdf},
doi = {10.1111/j.1467-8640.2008.01329.x},
year = {2009},
date = {2009-01-01},
journal = {Computational Intelligence},
volume = {25},
number = {1},
pages = {31--50},
abstract = {Choosing when to communicate is a fundamental problem in multi-agent systems. This problem becomes particularly challenging when communication is constrained and each agent has different partial information about the overall situation. We take a decision-theoretic approach to this problem that balances the benefits of communication against the costs. Although computing the exact value of communication is intractable, it can be estimated using a standard myopic assumption--that communication is only possible at the present time. We examine specific situations in which this assumption leads to poor performance and demonstrate an alternative approach that relaxes the assumption and improves performance. The results provide an effective method for value-driven communication policies in multi-agent systems.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Choosing when to communicate is a fundamental problem in multi-agent systems. This problem becomes particularly challenging when communication is constrained and each agent has different partial information about the overall situation. We take a decision-theoretic approach to this problem that balances the benefits of communication against the costs. Although computing the exact value of communication is intractable, it can be estimated using a standard myopic assumption--that communication is only possible at the present time. We examine specific situations in which this assumption leads to poor performance and demonstrate an alternative approach that relaxes the assumption and improves performance. The results provide an effective method for value-driven communication policies in multi-agent systems. |
Christopher Amato; Shlomo Zilberstein Achieving Goals in Decentralized POMDPs Conference Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Budapest, Hungary, 2009. @conference{ASZ:Zaamas09,
title = {Achieving Goals in Decentralized POMDPs},
author = {Christopher Amato and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/AZaamas09.pdf},
year = {2009},
date = {2009-01-01},
booktitle = {Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS)},
pages = {593--600},
address = {Budapest, Hungary},
abstract = {Coordination of multiple agents under uncertainty in the decentralized POMDP model is known to be NEXP-complete, even when the agents have a joint set of goals. Nevertheless, we show that the existence of goals can help develop effective planning algorithms. We examine an approach to model these problems as indefinite-horizon decentralized POMDPs, suitable for many practical problems that terminate after some unspecified number of steps. Our algorithm for solving these problems is optimal under some common assumptions--that terminal actions exist for each agent and rewards for non-terminal actions are negative. We also propose an infinite-horizon approximation method that allows us to relax these assumptions while maintaining goal conditions. An optimality bound is developed for this sample-based approach and experimental results show that it is able to exploit the goal structure effectively. Compared with the state-of-the-art, our approach can solve larger problems and produce significantly better solutions.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Coordination of multiple agents under uncertainty in the decentralized POMDP model is known to be NEXP-complete, even when the agents have a joint set of goals. Nevertheless, we show that the existence of goals can help develop effective planning algorithms. We examine an approach to model these problems as indefinite-horizon decentralized POMDPs, suitable for many practical problems that terminate after some unspecified number of steps. Our algorithm for solving these problems is optimal under some common assumptions--that terminal actions exist for each agent and rewards for non-terminal actions are negative. We also propose an infinite-horizon approximation method that allows us to relax these assumptions while maintaining goal conditions. An optimality bound is developed for this sample-based approach and experimental results show that it is able to exploit the goal structure effectively. Compared with the state-of-the-art, our approach can solve larger problems and produce significantly better solutions. |
Akshat Kumar; Shlomo Zilberstein Constraint-Based Dynamic Programming for Decentralized POMDPs with Structured Interactions Conference Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Budapest, Hungary, 2009. @conference{SZ:KZaamas09,
title = {Constraint-Based Dynamic Programming for Decentralized POMDPs with Structured Interactions},
author = {Akshat Kumar and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/KZaamas09.pdf},
year = {2009},
date = {2009-01-01},
booktitle = {Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS)},
pages = {561--568},
address = {Budapest, Hungary},
abstract = {Decentralized partially observable MDPs (DEC-POMDPs) provide a rich framework for modeling decision making by a team of agents. Despite rapid progress in this area, the limited scalability of solution techniques has restricted the applicability of the model. To overcome this computational barrier, research has focused on restricted classes of DEC-POMDPs, which are easier to solve yet rich enough to capture many practical problems. We present CBDP, an efficient and scalable point-based dynamic programming algorithm for one such model called ND-POMDP (Network Distributed POMDP). Specifically, CBDP provides magnitudes of speedup in the policy computation and generates better quality solution for all test instances. It has linear complexity in the number of agents and horizon length. Furthermore, the complexity per horizon for the examined class of problems is exponential only in a small parameter that depends upon the interaction among the agents, achieving significant scalability for large, loosely coupled multi-agent systems. The efficiency of CBDP lies in exploiting the structure of interactions using constraint networks. These results extend significantly the effectiveness of decision-theoretic planning in multi-agent settings.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Decentralized partially observable MDPs (DEC-POMDPs) provide a rich framework for modeling decision making by a team of agents. Despite rapid progress in this area, the limited scalability of solution techniques has restricted the applicability of the model. To overcome this computational barrier, research has focused on restricted classes of DEC-POMDPs, which are easier to solve yet rich enough to capture many practical problems. We present CBDP, an efficient and scalable point-based dynamic programming algorithm for one such model called ND-POMDP (Network Distributed POMDP). Specifically, CBDP provides magnitudes of speedup in the policy computation and generates better quality solution for all test instances. It has linear complexity in the number of agents and horizon length. Furthermore, the complexity per horizon for the examined class of problems is exponential only in a small parameter that depends upon the interaction among the agents, achieving significant scalability for large, loosely coupled multi-agent systems. The efficiency of CBDP lies in exploiting the structure of interactions using constraint networks. These results extend significantly the effectiveness of decision-theoretic planning in multi-agent settings. |
Alan Carlin; Shlomo Zilberstein Value of Communication in Decentralized POMDPs Conference AAMAS Workshop on Multi-Agent Sequential Decision Making in Uncertain Domains (MSDM), Budapest, Hungary, 2009. @conference{SZ:CZmsdm09,
title = {Value of Communication in Decentralized POMDPs},
author = {Alan Carlin and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/CZmsdm09.pdf},
year = {2009},
date = {2009-01-01},
booktitle = {AAMAS Workshop on Multi-Agent Sequential Decision Making in Uncertain Domains (MSDM)},
address = {Budapest, Hungary},
abstract = {In decentralized settings with partial observability, agents can often benefit from communicating, but communication resources may be limited and costly. Current approaches tend to dismiss or underestimate this cost, resulting in over communication. This paper presents a general framework to compute the value of communicating from each agent's local perspective, by comparing the expected reward with and without communication. In order to obtain these expectations, each agent must reason about the state and belief states of the other agents, both before and after communication. We show how this can be done in the context of decentralized POMDPs and discuss ways to mitigate a common myopic assumption, where agents tend to over communicate because they overlook the possibility that communication can be deferred or initiated by the other agents. The paper presents a theoretical framework to precisely quantify the value of communication and an effective algorithm to manage communication. Experimental results show that our approach performs well compared to other techniques suggested in the literature.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
In decentralized settings with partial observability, agents can often benefit from communicating, but communication resources may be limited and costly. Current approaches tend to dismiss or underestimate this cost, resulting in over communication. This paper presents a general framework to compute the value of communicating from each agent's local perspective, by comparing the expected reward with and without communication. In order to obtain these expectations, each agent must reason about the state and belief states of the other agents, both before and after communication. We show how this can be done in the context of decentralized POMDPs and discuss ways to mitigate a common myopic assumption, where agents tend to over communicate because they overlook the possibility that communication can be deferred or initiated by the other agents. The paper presents a theoretical framework to precisely quantify the value of communication and an effective algorithm to manage communication. Experimental results show that our approach performs well compared to other techniques suggested in the literature. |
Akshat Kumar; Shlomo Zilberstein Dynamic Programming Approximations for Partially Observable Stochastic Games Conference Proceedings of the 22nd International FLAIRS Conference, Sanibel Island, Florida, 2009. @conference{SZ:KZflairs09,
title = {Dynamic Programming Approximations for Partially Observable Stochastic Games},
author = {Akshat Kumar and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/CZmsdm09.pdf},
year = {2009},
date = {2009-01-01},
booktitle = {Proceedings of the 22nd International FLAIRS Conference},
pages = {547--552},
address = {Sanibel Island, Florida},
abstract = {Partially observable stochastic games (POSGs) provide a rich mathematical framework for planning under uncertainty by a group of agents. However, this modeling advantage comes with a price, namely a high computational cost. Solving POSGs optimally quickly becomes intractable after a few decision cycles. Our main contribution is to provide bounded approximation techniques, which enable us to scale POSG algorithms by several orders of magnitude. We study both the POSG model and its cooperative counterpart, DEC-POMDP. Experiments on a number of problems confirm the scalability of our approach while still providing useful policies.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Partially observable stochastic games (POSGs) provide a rich mathematical framework for planning under uncertainty by a group of agents. However, this modeling advantage comes with a price, namely a high computational cost. Solving POSGs optimally quickly becomes intractable after a few decision cycles. Our main contribution is to provide bounded approximation techniques, which enable us to scale POSG algorithms by several orders of magnitude. We study both the POSG model and its cooperative counterpart, DEC-POMDP. Experiments on a number of problems confirm the scalability of our approach while still providing useful policies. |
Daniel Sadoc Menasche; Giovanni Neglia; Don Towsley; Shlomo Zilberstein Strategic Reasoning About Bundling in Swarming Systems Conference Proceedings of the 1st International Conference on Game Theory for Networks, Istanbul, Turkey, 2009. @conference{SZ:MNTZgamenets09,
title = {Strategic Reasoning About Bundling in Swarming Systems},
author = {Daniel Sadoc Menasche and Giovanni Neglia and Don Towsley and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/MNTZgamenets09.pdf},
doi = {10.1109/GAMENETS.2009.5137451},
year = {2009},
date = {2009-01-01},
booktitle = {Proceedings of the 1st International Conference on Game Theory for Networks},
pages = {611--620},
address = {Istanbul, Turkey},
abstract = {The objects of study of this paper are swarming systems, a special kind of peer-to-peer systems where users interested in the same content at the same time cooperate with each other. In particular, we consider the problem of how to combine files into bundles in such systems. First, we analyze the case of a monopoly where a single publisher decides how to aggregate its files so as to satisfy user demands while mitigating its serving costs. We establish conditions for the existence and uniqueness of an equilibrium and how the publisher's bundling strategy affects its profit. Then, we consider the competitive case where bundling decisions of one publisher affect the outcome of other publishers. Using normal form games we analyze the impact of different system parameters on the Nash equilibrium.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
The objects of study of this paper are swarming systems, a special kind of peer-to-peer systems where users interested in the same content at the same time cooperate with each other. In particular, we consider the problem of how to combine files into bundles in such systems. First, we analyze the case of a monopoly where a single publisher decides how to aggregate its files so as to satisfy user demands while mitigating its serving costs. We establish conditions for the existence and uniqueness of an equilibrium and how the publisher's bundling strategy affects its profit. Then, we consider the competitive case where bundling decisions of one publisher affect the outcome of other publishers. Using normal form games we analyze the impact of different system parameters on the Nash equilibrium. |
Marek Petrik; Shlomo Zilberstein Constraint Relaxation in Approximate Linear Programs Conference Proceedings of the 26th International Conference on Machine Learning (ICML), Montreal, Canada, 2009. @conference{SZ:PZicml09,
title = {Constraint Relaxation in Approximate Linear Programs},
author = {Marek Petrik and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/PZicml09.pdf},
year = {2009},
date = {2009-01-01},
booktitle = {Proceedings of the 26th International Conference on Machine Learning (ICML)},
pages = {809--816},
address = {Montreal, Canada},
abstract = {Approximate Linear Programming (ALP) is a reinforcement learning technique with nice theoretical properties, but it often performs poorly in practice. We identify some reasons for the poor quality of ALP solutions in problems where the approximation induces virtual loops. We then introduce two methods for improving solution quality. One method rolls out selected constraints of the ALP, guided by the dual information. The second method is a relaxation of the ALP, based on external penalty methods. The latter method is applicable in domains in which rolling out constraints is impractical. Both approaches show promising empirical results for simple benchmark problems as well as for a realistic blood inventory management problem.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Approximate Linear Programming (ALP) is a reinforcement learning technique with nice theoretical properties, but it often performs poorly in practice. We identify some reasons for the poor quality of ALP solutions in problems where the approximation induces virtual loops. We then introduce two methods for improving solution quality. One method rolls out selected constraints of the ALP, guided by the dual information. The second method is a relaxation of the ALP, based on external penalty methods. The latter method is applicable in domains in which rolling out constraints is impractical. Both approaches show promising empirical results for simple benchmark problems as well as for a realistic blood inventory management problem. |
Akshat Kumar; Shlomo Zilberstein Event-Detecting Multi-Agent MDPs: Complexity and Constant-Factor Approximation Conference Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI), Pasadena, California, 2009. @conference{SZ:KZijcai09,
title = {Event-Detecting Multi-Agent MDPs: Complexity and Constant-Factor Approximation},
author = {Akshat Kumar and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/KZijcai09.pdf},
year = {2009},
date = {2009-01-01},
booktitle = {Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI)},
pages = {201--207},
address = {Pasadena, California},
abstract = {Planning under uncertainty for multiple agents has grown rapidly with the development of formal models such as multi-agent MDPs and decentralized MDPs. But despite their richness, the applicability of these models remains limited due to their computational complexity. We present the class of event-detecting multi-agent MDPs (eMMDPs), designed to detect multiple mobile targets by a team of sensor agents. We show that eMMDPs are NP-Hard and present a scalable 2-approximation algorithm for solving them using matroid theory and constraint optimization. The complexity of the algorithm is linear in the state-space and number of agents, quadratic in the horizon, and exponential only in a small parameter that depends on the interaction among the agents. Despite the worst-case approximation ratio of 2, experimental results show that the algorithm produces near-optimal policies for a range of test problems.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Planning under uncertainty for multiple agents has grown rapidly with the development of formal models such as multi-agent MDPs and decentralized MDPs. But despite their richness, the applicability of these models remains limited due to their computational complexity. We present the class of event-detecting multi-agent MDPs (eMMDPs), designed to detect multiple mobile targets by a team of sensor agents. We show that eMMDPs are NP-Hard and present a scalable 2-approximation algorithm for solving them using matroid theory and constraint optimization. The complexity of the algorithm is linear in the state-space and number of agents, quadratic in the horizon, and exponential only in a small parameter that depends on the interaction among the agents. Despite the worst-case approximation ratio of 2, experimental results show that the algorithm produces near-optimal policies for a range of test problems. |
Siddharth Srivastava; Neil Immerman; Shlomo Zilberstein Abstract Planning with Unknown Object Quantities and Properties Conference Proceedings of the 8th Symposium on Abstraction, Reformulation, and Approximation (SARA), Lake Arrowhead, California, 2009. @conference{SZ:SIZsara09,
title = {Abstract Planning with Unknown Object Quantities and Properties},
author = {Siddharth Srivastava and Neil Immerman and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/SIZsara09.pdf},
year = {2009},
date = {2009-01-01},
booktitle = {Proceedings of the 8th Symposium on Abstraction, Reformulation, and Approximation (SARA)},
pages = {143--150},
address = {Lake Arrowhead, California},
abstract = {State abstraction has been widely used for state aggregation in approaches to AI search and planning. In this paper we use a powerful abstraction technique from software model checking for representing collections of states with different object quantities and properties. We exploit this method to develop precise abstractions and action operators for use in AI. This enables us to find scalable, algorithm-like plans with branches and loops which can solve problems of unbounded sizes. We describe how this method of abstraction can be effectively used in AI, with compelling results from implementations of two planning algorithms.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
State abstraction has been widely used for state aggregation in approaches to AI search and planning. In this paper we use a powerful abstraction technique from software model checking for representing collections of states with different object quantities and properties. We exploit this method to develop precise abstractions and action operators for use in AI. This enables us to find scalable, algorithm-like plans with branches and loops which can solve problems of unbounded sizes. We describe how this method of abstraction can be effectively used in AI, with compelling results from implementations of two planning algorithms. |
Christopher Amato; Jilles Steeve Dibangoye; Shlomo Zilberstein Incremental Policy Generation for Finite-Horizon DEC-POMDPs Conference Proceedings of the 19th International Conference on Automated Planning and Scheduling (ICAPS), Thessaloniki, Greece, 2009. @conference{SZ:ADZicaps09,
title = {Incremental Policy Generation for Finite-Horizon DEC-POMDPs},
author = {Christopher Amato and Jilles Steeve Dibangoye and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/ADZicaps09.pdf},
year = {2009},
date = {2009-01-01},
booktitle = {Proceedings of the 19th International Conference on Automated Planning and Scheduling (ICAPS)},
pages = {2--9},
address = {Thessaloniki, Greece},
abstract = {Decentralized partially observable MDPs (DEC-POMDPs) provide a rich framework for modeling decision making by a team of agents. Despite rapid progress in this area, the limited scalability of solution techniques has restricted the applicability of the model. To overcome this computational barrier, research has focused on restricted classes of DEC-POMDPs, which are easier to solve yet rich enough to capture many practical problems. We present CBDP, an efficient and scalable point-based dynamic programming algorithm for one such model called ND-POMDP (Network Distributed POMDP). Specifically, CBDP provides magnitudes of speedup in the policy computation and generates better quality solution for all test instances. It has linear complexity in the number of agents and horizon length. Furthermore, the complexity per horizon for the examined class of problems is exponential only in a small parameter that depends upon the interaction among the agents, achieving significant scalability for large, loosely coupled multi-agent systems. The efficiency of CBDP lies in exploiting the structure of interactions using constraint networks. These results extend significantly the effectiveness of decision-theoretic planning in multi-agent settings.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Decentralized partially observable MDPs (DEC-POMDPs) provide a rich framework for modeling decision making by a team of agents. Despite rapid progress in this area, the limited scalability of solution techniques has restricted the applicability of the model. To overcome this computational barrier, research has focused on restricted classes of DEC-POMDPs, which are easier to solve yet rich enough to capture many practical problems. We present CBDP, an efficient and scalable point-based dynamic programming algorithm for one such model called ND-POMDP (Network Distributed POMDP). Specifically, CBDP provides magnitudes of speedup in the policy computation and generates better quality solution for all test instances. It has linear complexity in the number of agents and horizon length. Furthermore, the complexity per horizon for the examined class of problems is exponential only in a small parameter that depends upon the interaction among the agents, achieving significant scalability for large, loosely coupled multi-agent systems. The efficiency of CBDP lies in exploiting the structure of interactions using constraint networks. These results extend significantly the effectiveness of decision-theoretic planning in multi-agent settings. |
Feng Wu; Shlomo Zilberstein; Xiaoping Chen Multi-Agent Online Planning with Communication Conference Proceedings of the 19th International Conference on Automated Planning and Scheduling (ICAPS), Thessaloniki, Greece, 2009. @conference{SZ:WZCicaps09,
title = {Multi-Agent Online Planning with Communication},
author = {Feng Wu and Shlomo Zilberstein and Xiaoping Chen},
url = {http://rbr.cs.umass.edu/shlomo/papers/WZCicaps09.pdf},
year = {2009},
date = {2009-01-01},
booktitle = {Proceedings of the 19th International Conference on Automated Planning and Scheduling (ICAPS)},
pages = {321--329},
address = {Thessaloniki, Greece},
abstract = {We propose an online algorithm for planning under uncertainty in multi-agent settings modeled as DEC-POMDPs. The algorithm helps overcome the high computational complexity of solving such problems off-line. The key challenge is to produce coordinated behavior using little or no communication. When communication is allowed but constrained, the challenge is to produce high value with minimal communication. The algorithm addresses these challenges by communicating only when history inconsistency is detected, allowing communication to be postponed if necessary. Moreover, it bounds the memory usage at each step and can be applied to problems with arbitrary horizons. The experimental results confirm that the algorithm can solve problems that are too large for the best existing off-line planning algorithms and it outperforms the best online method, producing higher value with much less communication in most cases.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
We propose an online algorithm for planning under uncertainty in multi-agent settings modeled as DEC-POMDPs. The algorithm helps overcome the high computational complexity of solving such problems off-line. The key challenge is to produce coordinated behavior using little or no communication. When communication is allowed but constrained, the challenge is to produce high value with minimal communication. The algorithm addresses these challenges by communicating only when history inconsistency is detected, allowing communication to be postponed if necessary. Moreover, it bounds the memory usage at each step and can be applied to problems with arbitrary horizons. The experimental results confirm that the algorithm can solve problems that are too large for the best existing off-line planning algorithms and it outperforms the best online method, producing higher value with much less communication in most cases. |
Siddharth Srivastava; Neil Immerman; Shlomo Zilberstein Challenges in Finding Generalized Plans Conference ICAPS Workshop on Generalized Planning: Macros, Loops, Domain Control, Thessaloniki, Greece, 2009. @conference{SZ:SIZicaps09ws1,
title = {Challenges in Finding Generalized Plans},
author = {Siddharth Srivastava and Neil Immerman and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/SIZicaps09ws1.pdf},
year = {2009},
date = {2009-01-01},
booktitle = {ICAPS Workshop on Generalized Planning: Macros, Loops, Domain Control},
address = {Thessaloniki, Greece},
abstract = {We present a simple and precise definition of generalized planning together with five natural dimensions of quality for measuring any generalized plan. We argue that no existing approach excels in all these dimensions. In the remainder of the paper we present a new approach to generalized planning that addresses all five of these dimensions.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
We present a simple and precise definition of generalized planning together with five natural dimensions of quality for measuring any generalized plan. We argue that no existing approach excels in all these dimensions. In the remainder of the paper we present a new approach to generalized planning that addresses all five of these dimensions. |
Siddharth Srivastava; Neil Immerman; Shlomo Zilberstein Finding Plans with Branches, Loops and Preconditions Conference ICAPS Workshop on Verification and Validation of Planning and Scheduling Systems, Thessaloniki, Greece, 2009. @conference{SZ:SIZicaps09ws2,
title = {Finding Plans with Branches, Loops and Preconditions},
author = {Siddharth Srivastava and Neil Immerman and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/SIZicaps09ws2.pdf},
year = {2009},
date = {2009-01-01},
booktitle = {ICAPS Workshop on Verification and Validation of Planning and Scheduling Systems},
address = {Thessaloniki, Greece},
abstract = {We present a new approach for finding conditional plans with loops and branches for planning in situations with uncertainty in state properties as well as in object quantities. We use a state abstraction technique from static analysis of programs to builds such plans incrementally using generalizations of input example plans generated by classical planners. Preconditions of the resulting plans with loops are computed by analyzing the changes in the counts of objects of different types across each loop. The scope and scalability of this approach are demonstrated using experimental results on common benchmark domains.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
We present a new approach for finding conditional plans with loops and branches for planning in situations with uncertainty in state properties as well as in object quantities. We use a state abstraction technique from static analysis of programs to builds such plans incrementally using generalizations of input example plans generated by classical planners. Preconditions of the resulting plans with loops are computed by analyzing the changes in the counts of objects of different types across each loop. The scope and scalability of this approach are demonstrated using experimental results on common benchmark domains. |
Alan Carlin; Shlomo Zilberstein Myopic and Non-Myopic Communication Under Partial Observability Conference Proceedings of the IEEE/WIC/ACM International Conference on Intelligent Agent Technology, Milan, Italy, 2009. @conference{SZ:CZiat09,
title = {Myopic and Non-Myopic Communication Under Partial Observability},
author = {Alan Carlin and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/CZiat09.pdf},
doi = {10.1109/WI-IAT.2009.174},
year = {2009},
date = {2009-01-01},
booktitle = {Proceedings of the IEEE/WIC/ACM International Conference on Intelligent Agent Technology},
pages = {331--338},
address = {Milan, Italy},
abstract = {In decentralized settings with partial observability, agents can often benefit from communicating, but communication resources may be limited and costly. Current approaches tend to dismiss or underestimate this cost, resulting in over communication. This paper presents a general framework to compute the value of communicating from each agent's local perspective, by comparing the expected reward with and without communication. In order to obtain these expectations, each agent must reason about the state and belief states of the other agents, both before and after communication. We show how this can be done in the context of decentralized POMDPs and discuss ways to mitigate a common myopic assumption, where agents tend to over communicate because they overlook the possibility that communication can be deferred or initiated by the other agents. The paper presents a theoretical framework to precisely quantify the value of communication and an effective algorithm to manage communication. Experimental results show that our approach performs well compared to other techniques suggested in the literature.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
In decentralized settings with partial observability, agents can often benefit from communicating, but communication resources may be limited and costly. Current approaches tend to dismiss or underestimate this cost, resulting in over communication. This paper presents a general framework to compute the value of communicating from each agent's local perspective, by comparing the expected reward with and without communication. In order to obtain these expectations, each agent must reason about the state and belief states of the other agents, both before and after communication. We show how this can be done in the context of decentralized POMDPs and discuss ways to mitigate a common myopic assumption, where agents tend to over communicate because they overlook the possibility that communication can be deferred or initiated by the other agents. The paper presents a theoretical framework to precisely quantify the value of communication and an effective algorithm to manage communication. Experimental results show that our approach performs well compared to other techniques suggested in the literature. |
Martin Allen; Shlomo Zilberstein Complexity of Decentralized Control: Special Cases Conference Proceedings of the 23rd Neural Information Processing Systems Conference (NIPS), Vancouver, British Columbia, Canada, 2009. @conference{SZ:AZnips09,
title = {Complexity of Decentralized Control: Special Cases},
author = {Martin Allen and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/AZnips09.pdf},
year = {2009},
date = {2009-01-01},
booktitle = {Proceedings of the 23rd Neural Information Processing Systems Conference (NIPS)},
pages = {19--27},
address = {Vancouver, British Columbia, Canada},
abstract = {The worst-case complexity of general decentralized POMDPs, which are equivalent to partially observable stochastic games (POSGs) is very high, both for the cooperative and competitive cases. Some reductions in complexity have been achieved by exploiting independence relations in some models. We show that these results are somewhat limited: when these independence assumptions are relaxed in very small ways, complexity returns to that of the general case.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
The worst-case complexity of general decentralized POMDPs, which are equivalent to partially observable stochastic games (POSGs) is very high, both for the cooperative and competitive cases. Some reductions in complexity have been achieved by exploiting independence relations in some models. We show that these results are somewhat limited: when these independence assumptions are relaxed in very small ways, complexity returns to that of the general case. |
Marek Petrik; Shlomo Zilberstein Robust Value Function Approximation Using Bilinear Programming Conference Proceedings of the 23rd Neural Information Processing Systems Conference (NIPS), Vancouver, British Columbia, Canada, 2009. @conference{SZ:PZnips09,
title = {Robust Value Function Approximation Using Bilinear Programming},
author = {Marek Petrik and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/PZnips09.pdf},
year = {2009},
date = {2009-01-01},
booktitle = {Proceedings of the 23rd Neural Information Processing Systems Conference (NIPS)},
pages = {1446--1454},
address = {Vancouver, British Columbia, Canada},
abstract = {Existing value function approximation methods have been successfully used in many applications, but they often lack useful a priori error bounds. We propose approximate bilinear programming, a new formulation of value function approximation that provides strong a priori guarantees. In particular, this approach provably finds an approximate value function that minimizes the Bellman residual. Solving a bilinear program optimally is NP-hard, but this is unavoidable because the Bellman-residual minimization itself is NP-hard. We therefore employ and analyze a common approximate algorithm for bilinear programs. The analysis shows that this algorithm offers a convergent generalization of approximate policy iteration. Finally, we demonstrate that the proposed approach can consistently minimize the Bellman residual on a simple benchmark problem.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Existing value function approximation methods have been successfully used in many applications, but they often lack useful a priori error bounds. We propose approximate bilinear programming, a new formulation of value function approximation that provides strong a priori guarantees. In particular, this approach provably finds an approximate value function that minimizes the Bellman residual. Solving a bilinear program optimally is NP-hard, but this is unavoidable because the Bellman-residual minimization itself is NP-hard. We therefore employ and analyze a common approximate algorithm for bilinear programs. The analysis shows that this algorithm offers a convergent generalization of approximate policy iteration. Finally, we demonstrate that the proposed approach can consistently minimize the Bellman residual on a simple benchmark problem. |
2008
|
Claudia V Goldman; Shlomo Zilberstein Communication-Based Decomposition Mechanisms for Decentralized MDPs Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 32, pp. 169–202, 2008. @article{SZ:GZjair08,
title = {Communication-Based Decomposition Mechanisms for Decentralized MDPs},
author = {Claudia V Goldman and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/GZjair08.pdf},
doi = {10.1613/jair.2466},
year = {2008},
date = {2008-01-01},
journal = {Journal of Artificial Intelligence Research (JAIR)},
volume = {32},
pages = {169--202},
abstract = {Multi-agent planning in stochastic environments can be framed formally as a decentralized Markov decision problem. Many real-life distributed problems that arise in manufacturing, multi-robot coordination and information gathering scenarios can be formalized using this framework. However, finding the optimal solution in the general case is hard, limiting the applicability of recently developed algorithms. This paper provides a practical approach for solving decentralized control problems when communication among the decision makers is possible, but costly. We develop the notion of communication-based mechanism that allows us to decompose a decentralized MDP into multiple single-agent problems. In this framework, referred to as decentralized semi-Markov decision process with direct communication (Dec-SMDP-Com), agents operate separately between communications. We show that finding an optimal mechanism is equivalent to solving optimally a Dec-SMDP-Com. We also provide a heuristic search algorithm that converges on the optimal decomposition. Restricting the decomposition to some specific types of local behaviors reduces significantly the complexity of planning. In particular, we present a polynomial-time algorithm for the case in which individual agents perform goal-oriented behaviors between communications. The paper concludes with an additional tractable algorithm that enables the introduction of human knowledge, thereby reducing the overall problem to finding the best time to communicate. Empirical results show that these approaches provide good approximate solutions.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Multi-agent planning in stochastic environments can be framed formally as a decentralized Markov decision problem. Many real-life distributed problems that arise in manufacturing, multi-robot coordination and information gathering scenarios can be formalized using this framework. However, finding the optimal solution in the general case is hard, limiting the applicability of recently developed algorithms. This paper provides a practical approach for solving decentralized control problems when communication among the decision makers is possible, but costly. We develop the notion of communication-based mechanism that allows us to decompose a decentralized MDP into multiple single-agent problems. In this framework, referred to as decentralized semi-Markov decision process with direct communication (Dec-SMDP-Com), agents operate separately between communications. We show that finding an optimal mechanism is equivalent to solving optimally a Dec-SMDP-Com. We also provide a heuristic search algorithm that converges on the optimal decomposition. Restricting the decomposition to some specific types of local behaviors reduces significantly the complexity of planning. In particular, we present a polynomial-time algorithm for the case in which individual agents perform goal-oriented behaviors between communications. The paper concludes with an additional tractable algorithm that enables the introduction of human knowledge, thereby reducing the overall problem to finding the best time to communicate. Empirical results show that these approaches provide good approximate solutions. |
Sven Seuken; Shlomo Zilberstein Formal Models and Algorithms for Decentralized Decision Making under Uncertainty Journal Article In: Autonomous Agents and Multi-Agent Systems (JAAMAS), vol. 17, no. 2, pp. 190–250, 2008. @article{SZ:SZjaamas08,
title = {Formal Models and Algorithms for Decentralized Decision Making under Uncertainty},
author = {Sven Seuken and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/SZjaamas08.pdf},
doi = {10.1007/s10458-007-9026-5},
year = {2008},
date = {2008-01-01},
journal = {Autonomous Agents and Multi-Agent Systems (JAAMAS)},
volume = {17},
number = {2},
pages = {190--250},
abstract = {Multi-agent planning in stochastic environments can be framed formally as a decentralized Markov decision problem. Many real-life distributed problems that arise in manufacturing, multi-robot coordination and information gathering scenarios can be formalized using this framework. However, finding the optimal solution in the general case is hard, limiting the applicability of recently developed algorithms. This paper provides a practical approach for solving decentralized control problems when communication among the decision makers is possible, but costly. We develop the notion of communication-based mechanism that allows us to decompose a decentralized MDP into multiple single-agent problems. In this framework, referred to as decentralized semi-Markov decision process with direct communication (Dec-SMDP-Com), agents operate separately between communications. We show that finding an optimal mechanism is equivalent to solving optimally a Dec-SMDP-Com. We also provide a heuristic search algorithm that converges on the optimal decomposition. Restricting the decomposition to some specific types of local behaviors reduces significantly the complexity of planning. In particular, we present a polynomial-time algorithm for the case in which individual agents perform goal-oriented behaviors between communications. The paper concludes with an additional tractable algorithm that enables the introduction of human knowledge, thereby reducing the overall problem to finding the best time to communicate. Empirical results show that these approaches provide good approximate solutions.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Multi-agent planning in stochastic environments can be framed formally as a decentralized Markov decision problem. Many real-life distributed problems that arise in manufacturing, multi-robot coordination and information gathering scenarios can be formalized using this framework. However, finding the optimal solution in the general case is hard, limiting the applicability of recently developed algorithms. This paper provides a practical approach for solving decentralized control problems when communication among the decision makers is possible, but costly. We develop the notion of communication-based mechanism that allows us to decompose a decentralized MDP into multiple single-agent problems. In this framework, referred to as decentralized semi-Markov decision process with direct communication (Dec-SMDP-Com), agents operate separately between communications. We show that finding an optimal mechanism is equivalent to solving optimally a Dec-SMDP-Com. We also provide a heuristic search algorithm that converges on the optimal decomposition. Restricting the decomposition to some specific types of local behaviors reduces significantly the complexity of planning. In particular, we present a polynomial-time algorithm for the case in which individual agents perform goal-oriented behaviors between communications. The paper concludes with an additional tractable algorithm that enables the introduction of human knowledge, thereby reducing the overall problem to finding the best time to communicate. Empirical results show that these approaches provide good approximate solutions. |
Marek Petrik; Shlomo Zilberstein A Successive Approximation Algorithm for Coordination Problems Conference Proceedings of the 10th International Symposium on Artificial Intelligence and Mathematics (ISAIM), Ft. Lauderdale, Florida, 2008. @conference{SZ:PZisaim08,
title = {A Successive Approximation Algorithm for Coordination Problems},
author = {Marek Petrik and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/PZisaim08.pdf},
year = {2008},
date = {2008-01-01},
booktitle = {Proceedings of the 10th International Symposium on Artificial Intelligence and Mathematics (ISAIM)},
address = {Ft. Lauderdale, Florida},
abstract = {Developing scalable coordination algorithms for multi-agent systems is a hard computational challenge. One useful approach, demonstrated by the Coverage Set Algorithm (CSA), exploits structured interaction to produce significant computational gains. Empirically, CSA exhibits very good anytime performance, but an error bound on the results has not been established. We reformulate the algorithm and derive an online error bound for approximate solutions. Moreover, we propose an effective way to automatically reduce the complexity of the interaction. Our experiments show that this is a promising approach to solve a broad class of decentralized decision problems. The general formulation used by the algorithm makes it both easy to implement and widely applicable to a variety of other AI problems.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Developing scalable coordination algorithms for multi-agent systems is a hard computational challenge. One useful approach, demonstrated by the Coverage Set Algorithm (CSA), exploits structured interaction to produce significant computational gains. Empirically, CSA exhibits very good anytime performance, but an error bound on the results has not been established. We reformulate the algorithm and derive an online error bound for approximate solutions. Moreover, we propose an effective way to automatically reduce the complexity of the interaction. Our experiments show that this is a promising approach to solve a broad class of decentralized decision problems. The general formulation used by the algorithm makes it both easy to implement and widely applicable to a variety of other AI problems. |
Siddharth Srivastava; Neil Immerman; Shlomo Zilberstein Using Abstraction for Generalized Planning Conference Proceedings of the 10th International Symposium on Artificial Intelligence and Mathematics (ISAIM), Ft. Lauderdale, Florida, 2008. @conference{SZ:SIZisaim08,
title = {Using Abstraction for Generalized Planning},
author = {Siddharth Srivastava and Neil Immerman and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/SIZisaim08.pdf},
year = {2008},
date = {2008-01-01},
booktitle = {Proceedings of the 10th International Symposium on Artificial Intelligence and Mathematics (ISAIM)},
address = {Ft. Lauderdale, Florida},
abstract = {Given the complexity of planning, it is often beneficial to create plans that work for a wide class of problems. This facilitates reuse of existing plans for different instances of the same problem or even for other problems that are somehow similar. We present novel approaches for finding such plans through search and for learning them from examples. We use state representation and abstraction techniques originally developed for static analysis of programs. The generalized plans that we compute include loops and work for classes of problems having varying numbers of objects that must be manipulated to reach the goal. Our algorithm for learning generalized plans takes as its input an example plan for a certain problem instance and a finite 3-valued first-order structure representing a set of initial states from different problem instances. It learns a generalized plan along with a classification of the problem instances where it works. The algorithm for finding plans takes as input a similar 3-valued structure and a goal test. Its output is a set of generalized plans and conditions describing the problem instances for which they work.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Given the complexity of planning, it is often beneficial to create plans that work for a wide class of problems. This facilitates reuse of existing plans for different instances of the same problem or even for other problems that are somehow similar. We present novel approaches for finding such plans through search and for learning them from examples. We use state representation and abstraction techniques originally developed for static analysis of programs. The generalized plans that we compute include loops and work for classes of problems having varying numbers of objects that must be manipulated to reach the goal. Our algorithm for learning generalized plans takes as its input an example plan for a certain problem instance and a finite 3-valued first-order structure representing a set of initial states from different problem instances. It learns a generalized plan along with a classification of the problem instances where it works. The algorithm for finding plans takes as input a similar 3-valued structure and a goal test. Its output is a set of generalized plans and conditions describing the problem instances for which they work. |
Alan Carlin; Shlomo Zilberstein Value-Based Observation Compression for DEC-POMDPs Conference Proceedings of the 7th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Estoril, Portugal, 2008. @conference{SZ:CZaamas08,
title = {Value-Based Observation Compression for DEC-POMDPs},
author = {Alan Carlin and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/CZaamas08.pdf},
year = {2008},
date = {2008-01-01},
booktitle = {Proceedings of the 7th International Conference on Autonomous Agents and Multiagent Systems (AAMAS)},
pages = {501--508},
address = {Estoril, Portugal},
abstract = {Representing agent policies compactly is essential for improving the scalability of multi-agent planning algorithms. In this paper, we focus on developing a pruning technique that allows us to merge certain observations within agent policies, while minimizing loss of value. This is particularly important for solving finite-horizon decentralized POMDPs, where agent policies are represented as trees, and where the size of policy trees grows exponentially with the number of observations. We introduce a value-based observation compression technique that prunes the least valuable observations while maintaining an error bound on the value lost as a result of pruning. We analyze the characteristics of this pruning strategy and show empirically that it is effective. Thus, we use compact policies to obtain significantly higher values compared with the best existing DEC-POMDP algorithm.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Representing agent policies compactly is essential for improving the scalability of multi-agent planning algorithms. In this paper, we focus on developing a pruning technique that allows us to merge certain observations within agent policies, while minimizing loss of value. This is particularly important for solving finite-horizon decentralized POMDPs, where agent policies are represented as trees, and where the size of policy trees grows exponentially with the number of observations. We introduce a value-based observation compression technique that prunes the least valuable observations while maintaining an error bound on the value lost as a result of pruning. We analyze the characteristics of this pruning strategy and show empirically that it is effective. Thus, we use compact policies to obtain significantly higher values compared with the best existing DEC-POMDP algorithm. |
Christopher Amato; Shlomo Zilberstein Heuristic Policy Iteration for Infinite-Horizon Decentralized POMDPs Conference AAMAS Workshop on Multi-Agent Sequential Decision Making in Uncertain Domains (MSDM), Estoril, Portugal, 2008. @conference{SZ:AZmsdm08,
title = {Heuristic Policy Iteration for Infinite-Horizon Decentralized POMDPs},
author = {Christopher Amato and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/AZmsdm08.pdf},
year = {2008},
date = {2008-01-01},
booktitle = {AAMAS Workshop on Multi-Agent Sequential Decision Making in Uncertain Domains (MSDM)},
pages = {1--15},
address = {Estoril, Portugal},
abstract = {Decentralized POMDPs (DEC-POMDPs) offer a rich model for planning under uncertainty in multiagent settings. Improving the scalability of solution techniques is an important challenge. While an optimal algorithm has been developed for infinitehorizon DEC-POMDPs, it often requires an intractable amount of time and memory. To address this problem, we present a heuristic version of this algorithm. Our approach is able to use initial state information to decrease solution size and often increases solution quality over what is achievable by the optimal algorithm before resources are exhausted. Experimental results demonstrate that this heuristic approach is effective, producing higher values and more concise solutions in all three test domains.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Decentralized POMDPs (DEC-POMDPs) offer a rich model for planning under uncertainty in multiagent settings. Improving the scalability of solution techniques is an important challenge. While an optimal algorithm has been developed for infinitehorizon DEC-POMDPs, it often requires an intractable amount of time and memory. To address this problem, we present a heuristic version of this algorithm. Our approach is able to use initial state information to decrease solution size and often increases solution quality over what is achievable by the optimal algorithm before resources are exhausted. Experimental results demonstrate that this heuristic approach is effective, producing higher values and more concise solutions in all three test domains. |
Alan Carlin; Shlomo Zilberstein Observation Compression in DEC-POMDP Policy Trees Conference AAMAS Workshop on Multi-Agent Sequential Decision Making in Uncertain Domains (MSDM), Estoril, Portugal, 2008, (Best Paper Award). @conference{SZ:CZmsdm08,
title = {Observation Compression in DEC-POMDP Policy Trees},
author = {Alan Carlin and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/CZmsdm08.pdf},
year = {2008},
date = {2008-01-01},
booktitle = {AAMAS Workshop on Multi-Agent Sequential Decision Making in Uncertain Domains (MSDM)},
pages = {31--45},
address = {Estoril, Portugal},
abstract = {Representing agent policies compactly is essential for improving the scalability of multi-agent planning algorithms. In this paper, we focus on developing a pruning technique that allows us to merge certain observations from agent policies, while minimizing the loss of value. This is particularly important for solving finite-horizon decentralized POMDPs, where agent policies are represented as trees, and where the size of policy trees grows exponentially with the number of observations. We introduce a value-based observation compression technique that prunes the least valuable observations while maintaining an error bound on the value lost as a result of pruning. We analyze the characteristics of this pruning strategy and show empirically that it is effective. Thus, we use compact policies to obtain significantly higher values compared with the best existing DEC-POMDP algorithm.},
note = {Best Paper Award},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Representing agent policies compactly is essential for improving the scalability of multi-agent planning algorithms. In this paper, we focus on developing a pruning technique that allows us to merge certain observations from agent policies, while minimizing the loss of value. This is particularly important for solving finite-horizon decentralized POMDPs, where agent policies are represented as trees, and where the size of policy trees grows exponentially with the number of observations. We introduce a value-based observation compression technique that prunes the least valuable observations while maintaining an error bound on the value lost as a result of pruning. We analyze the characteristics of this pruning strategy and show empirically that it is effective. Thus, we use compact policies to obtain significantly higher values compared with the best existing DEC-POMDP algorithm. |
Siddharth Srivastava; Neil Immerman; Shlomo Zilberstein Learning Generalized Plans Using Abstract Counting Conference Proceedings of the 23rd Conference on Artificial Intelligence (AAAI), Chicago, Illinois, 2008. @conference{SZ:SIZaaai08,
title = {Learning Generalized Plans Using Abstract Counting},
author = {Siddharth Srivastava and Neil Immerman and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/SIZaaai08.pdf},
year = {2008},
date = {2008-01-01},
booktitle = {Proceedings of the 23rd Conference on Artificial Intelligence (AAAI)},
pages = {991--997},
address = {Chicago, Illinois},
abstract = {Given the complexity of planning, it is often beneficial to create plans that work for a wide class of problems. This facilitates reuse of existing plans for different instances drawn from the same problem or from an infinite family of similar problems. We define a class of such planning problems called generalized planning problems and present a novel approach for transforming classical plans into generalized plans. These algorithm-like plans include loops and work for problem instances having varying numbers of objects that must be manipulated to reach the goal. Our approach takes as input a classical plan for a certain problem instance. It outputs a generalized plan along with a classification of the problem instances where it is guaranteed to work. We illustrate the utility of our approach through results of a working implementation on various practical examples.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Given the complexity of planning, it is often beneficial to create plans that work for a wide class of problems. This facilitates reuse of existing plans for different instances drawn from the same problem or from an infinite family of similar problems. We define a class of such planning problems called generalized planning problems and present a novel approach for transforming classical plans into generalized plans. These algorithm-like plans include loops and work for problem instances having varying numbers of objects that must be manipulated to reach the goal. Our approach takes as input a classical plan for a certain problem instance. It outputs a generalized plan along with a classification of the problem instances where it is guaranteed to work. We illustrate the utility of our approach through results of a working implementation on various practical examples. |
Shlomo Zilberstein Metareasoning and Bounded Rationality Conference AAAI Workshop on Metareasoning: Thinking about Thinking, Chicago, Illinois, 2008. @conference{SZ:Zaaai08ws1,
title = {Metareasoning and Bounded Rationality},
author = {Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/Zaaai08ws1.pdf},
year = {2008},
date = {2008-01-01},
booktitle = {AAAI Workshop on Metareasoning: Thinking about Thinking},
address = {Chicago, Illinois},
abstract = {What role does metareasoning play in models of bounded rationality? We examine the various existing computational approaches to bounded rationality and divide them into three classes. Only one of these classes significantly relies on a metareasoning component. We explore the characteristics of this class of models and argue that it offers desirable properties. In fact, many of the effective approaches to bounded rationality that have been developed since the early 1980's match this particular paradigm. We conclude with some open research problems and challenges.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
What role does metareasoning play in models of bounded rationality? We examine the various existing computational approaches to bounded rationality and divide them into three classes. Only one of these classes significantly relies on a metareasoning component. We explore the characteristics of this class of models and argue that it offers desirable properties. In fact, many of the effective approaches to bounded rationality that have been developed since the early 1980's match this particular paradigm. We conclude with some open research problems and challenges. |
Christopher Amato; Daniel S Bernstein; Shlomo Zilberstein Optimizing Fixed-Size Stochastic Controllers for POMDPs Conference AAAI Workshop on Advancements in POMDP Solvers, Chicago, Illinois, 2008. @conference{SZ:ABZaaai08ws,
title = {Optimizing Fixed-Size Stochastic Controllers for POMDPs},
author = {Christopher Amato and Daniel S Bernstein and Shlomo Zilberstein},
year = {2008},
date = {2008-01-01},
booktitle = {AAAI Workshop on Advancements in POMDP Solvers},
address = {Chicago, Illinois},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
|
Alan Carlin; Shlomo Zilberstein POMDP and DEC-POMDP Point-Based Observation Aggregation Conference AAAI Workshop on Advancements in POMDP Solvers, Chicago, Illinois, 2008. @conference{SZ:CZaaai08ws,
title = {POMDP and DEC-POMDP Point-Based Observation Aggregation},
author = {Alan Carlin and Shlomo Zilberstein},
year = {2008},
date = {2008-01-01},
booktitle = {AAAI Workshop on Advancements in POMDP Solvers},
address = {Chicago, Illinois},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
|
Marek Petrik; Shlomo Zilberstein Learning Heuristic Functions Through Approximate Linear Programming Conference Proceedings of the 18th International Conference on Automated Planning and Scheduling, Sydney, Australia, 2008. @conference{SZ:PZicaps08,
title = {Learning Heuristic Functions Through Approximate Linear Programming},
author = {Marek Petrik and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/PZicaps08.pdf},
year = {2008},
date = {2008-01-01},
booktitle = {Proceedings of the 18th International Conference on Automated Planning and Scheduling},
pages = {248--255},
address = {Sydney, Australia},
abstract = {Planning problems are often formulated as heuristic search. The choice of the heuristic function plays a significant role in the performance of planning systems, but a good heuristic is not always available. We propose a new approach to learning heuristic functions from previously solved problem instances in a given domain. Our approach is based on approximate linear programming, commonly used in reinforcement learning. We show that our approach can be used effectively to learn admissible heuristic estimates and provide an analysis of the accuracy of the heuristic. When applied to common heuristic search problems, this approach reliably produces good heuristic functions.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Planning problems are often formulated as heuristic search. The choice of the heuristic function plays a significant role in the performance of planning systems, but a good heuristic is not always available. We propose a new approach to learning heuristic functions from previously solved problem instances in a given domain. Our approach is based on approximate linear programming, commonly used in reinforcement learning. We show that our approach can be used effectively to learn admissible heuristic estimates and provide an analysis of the accuracy of the heuristic. When applied to common heuristic search problems, this approach reliably produces good heuristic functions. |
Christopher Amato; Shlomo Zilberstein What's Worth Memorizing: Attribute-based Planning for DEC-POMDPs Conference ICAPS Workshop on Multiagent Planning, Sydney, Australia, 2008. @conference{SZ:AZmasplan08,
title = {What's Worth Memorizing: Attribute-based Planning for DEC-POMDPs},
author = {Christopher Amato and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/AZmasplan08.pdf},
year = {2008},
date = {2008-01-01},
booktitle = {ICAPS Workshop on Multiagent Planning},
address = {Sydney, Australia},
abstract = {Current algorithms for decentralized partially observable Markov decision processes (DEC-POMDPs) require a large amount of memory to produce high quality plans. To combat this, existing methods optimize a set of finite-state controllers with an arbitrary amount of fixed memory. While this works well for some problems, in general, scalability and solution quality remain limited. As an alternative, we propose remembering some attributes that summarize key aspects of an agent's action and observation history. These attributes are often simple to determine, provide a well-motivated choice of controller size and focus the solution search on important components of agent histories. We show that for a range of DEC-POMDPs such attribute-based representation improves plan quality and scalability.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Current algorithms for decentralized partially observable Markov decision processes (DEC-POMDPs) require a large amount of memory to produce high quality plans. To combat this, existing methods optimize a set of finite-state controllers with an arbitrary amount of fixed memory. While this works well for some problems, in general, scalability and solution quality remain limited. As an alternative, we propose remembering some attributes that summarize key aspects of an agent's action and observation history. These attributes are often simple to determine, provide a well-motivated choice of controller size and focus the solution search on important components of agent histories. We show that for a range of DEC-POMDPs such attribute-based representation improves plan quality and scalability. |
Martin Allen; Marek Petrik; Shlomo Zilberstein Interaction Structure and Dimensionality in Decentralized Problem Solving Technical Report Computer Science Department, University of Massachusetts Amherst no. 08-11, 2008. @techreport{SZ:APZtr0811,
title = {Interaction Structure and Dimensionality in Decentralized Problem Solving},
author = {Martin Allen and Marek Petrik and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/APZtr0811.pdf},
year = {2008},
date = {2008-01-01},
number = {08-11},
institution = {Computer Science Department, University of Massachusetts Amherst},
abstract = {Decentralized Markov Decision Processes are a powerful general model of decentralized, cooperative multi-agent problem solving. The high complexity of the general problem leads to a focus on restricted models. While the worst-case hardness of such reduced problems is often better, less is known about the actual expected difficulty of given instances. We show tight connections between the structure of agent interactions and the essential dimensionality of various problems. Bounds can be placed on the difficulty of solving problems, based upon restrictions on the type and number of interactions between agents. These bounds arise from a bilinear programming formulation of the problem; from such a formulation, a more compact reduced form can be automatically generated, and the original problem can be rewritten to take advantage of the reduction. These results are of theoretical and practical importance, improving our understanding of multi-agent problem domains, and paving the way for methods that reduce the complexity of such problems by limiting the degree of interaction between agents.},
keywords = {},
pubstate = {published},
tppubtype = {techreport}
}
Decentralized Markov Decision Processes are a powerful general model of decentralized, cooperative multi-agent problem solving. The high complexity of the general problem leads to a focus on restricted models. While the worst-case hardness of such reduced problems is often better, less is known about the actual expected difficulty of given instances. We show tight connections between the structure of agent interactions and the essential dimensionality of various problems. Bounds can be placed on the difficulty of solving problems, based upon restrictions on the type and number of interactions between agents. These bounds arise from a bilinear programming formulation of the problem; from such a formulation, a more compact reduced form can be automatically generated, and the original problem can be rewritten to take advantage of the reduction. These results are of theoretical and practical importance, improving our understanding of multi-agent problem domains, and paving the way for methods that reduce the complexity of such problems by limiting the degree of interaction between agents. |
2007
|
Claudia V Goldman; Martin Allen; Shlomo Zilberstein Learning to Communicate in a Decentralized Environment Journal Article In: Autonomous Agents and Multi-Agent Systems (JAAMAS), vol. 15, no. 1, pp. 47–90, 2007. @article{SZ:GAZjaamas07,
title = {Learning to Communicate in a Decentralized Environment},
author = {Claudia V Goldman and Martin Allen and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/GAZjaamas07.pdf},
doi = {10.1007/s10458-006-0008-9},
year = {2007},
date = {2007-01-01},
journal = {Autonomous Agents and Multi-Agent Systems (JAAMAS)},
volume = {15},
number = {1},
pages = {47--90},
abstract = {Learning to communicate is an emerging challenge in AI research. It is known that agents interacting in decentralized, stochastic environments can benefit from exchanging information. Multi-agent planning generally assumes that agents share a common means of communication; however, in building robust distributed systems it is important to address potential miscoordination resulting from misinterpretation of messages exchanged. This paper lays foundations for studying this problem, examining its properties analytically and empirically in a decision-theoretic context. We establish a formal framework for the problem, and identify a collection of necessary and sufficient properties for decision problems that allow agents to employ probabilistic updating schemes in order to learn how to interpret what others are communicating. Solving the problem optimally is often intractable, but our approach enables agents using different languages to converge upon coordination over time. Our experimental work establishes how these methods perform when applied to problems of varying complexity.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Learning to communicate is an emerging challenge in AI research. It is known that agents interacting in decentralized, stochastic environments can benefit from exchanging information. Multi-agent planning generally assumes that agents share a common means of communication; however, in building robust distributed systems it is important to address potential miscoordination resulting from misinterpretation of messages exchanged. This paper lays foundations for studying this problem, examining its properties analytically and empirically in a decision-theoretic context. We establish a formal framework for the problem, and identify a collection of necessary and sufficient properties for decision problems that allow agents to employ probabilistic updating schemes in order to learn how to interpret what others are communicating. Solving the problem optimally is often intractable, but our approach enables agents using different languages to converge upon coordination over time. Our experimental work establishes how these methods perform when applied to problems of varying complexity. |
Daniel Szer; Francois Charpillet; Shlomo Zilberstein Résolution optimale de DEC-POMDPs par recherche heuristique Journal Article In: Revue d'Intelligence Artificielle, vol. 21, no. 1, pp. 107–128, 2007. @article{SZ:SCZria07,
title = {Résolution optimale de DEC-POMDPs par recherche heuristique},
author = {Daniel Szer and Francois Charpillet and Shlomo Zilberstein},
url = {https://doi.org/10.3166/ria.21.107-128},
doi = {10.3166/ria.21.107-128},
year = {2007},
date = {2007-01-01},
journal = {Revue d'Intelligence Artificielle},
volume = {21},
number = {1},
pages = {107--128},
abstract = {We present the first generalized heuristic search formalism that is able to solve decentralized POMDPs of both finite and infinite horizon. Our algorithm is suitable for computing optimal plans for a cooperative group of agents that operate in a stochastic environment. These problems arise in domains such as multi-robot coordination, or network traffic control. We present a framework that is based on classical heuristic search on the one hand, and on decentralized control theory on the other hand. We prove that our approach is able to generate optimal deterministic controllers, and we study its performance on examples from the literature. (in French)},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
We present the first generalized heuristic search formalism that is able to solve decentralized POMDPs of both finite and infinite horizon. Our algorithm is suitable for computing optimal plans for a cooperative group of agents that operate in a stochastic environment. These problems arise in domains such as multi-robot coordination, or network traffic control. We present a framework that is based on classical heuristic search on the one hand, and on decentralized control theory on the other hand. We prove that our approach is able to generate optimal deterministic controllers, and we study its performance on examples from the literature. (in French) |
Christopher Amato; Daniel S Bernstein; Shlomo Zilberstein Solving POMDPs Using Quadratically Constrained Linear Programs Conference Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI), Hyderabad, India, 2007. @conference{SZ:ABZijcai07,
title = {Solving POMDPs Using Quadratically Constrained Linear Programs},
author = {Christopher Amato and Daniel S Bernstein and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/ABZijcai07.pdf},
year = {2007},
date = {2007-01-01},
booktitle = {Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI)},
pages = {2418--2424},
address = {Hyderabad, India},
abstract = {Developing scalable algorithms for solving partially observable Markov decision processes (POMDPs) is an important challenge. One approach that effectively addresses the intractable memory requirements of POMDP algorithms is based on representing POMDP policies as finite-state controllers. In this paper, we illustrate some fundamental disadvantages of existing techniques that use controllers. We then propose a new approach that formulates the problem as a quadratically constrained linear program (QCLP), which defines an optimal controller of a desired size. This representation allows a wide range of powerful nonlinear programming algorithms to be used to solve POMDPs. Although QCLP optimization techniques guarantee only local optimality, the results we obtain using an existing optimization method show significant solution improvement over the state-of-the-art techniques. The results open up promising research directions for solving large POMDPs using nonlinear programming methods.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Developing scalable algorithms for solving partially observable Markov decision processes (POMDPs) is an important challenge. One approach that effectively addresses the intractable memory requirements of POMDP algorithms is based on representing POMDP policies as finite-state controllers. In this paper, we illustrate some fundamental disadvantages of existing techniques that use controllers. We then propose a new approach that formulates the problem as a quadratically constrained linear program (QCLP), which defines an optimal controller of a desired size. This representation allows a wide range of powerful nonlinear programming algorithms to be used to solve POMDPs. Although QCLP optimization techniques guarantee only local optimality, the results we obtain using an existing optimization method show significant solution improvement over the state-of-the-art techniques. The results open up promising research directions for solving large POMDPs using nonlinear programming methods. |
Ron Bekkerman; Shlomo Zilberstein; James Allan Web Page Clustering using Heuristic Search in the Web Graph Conference Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI), Hyderabad, India, 2007. @conference{SZ:BZAijcai07,
title = {Web Page Clustering using Heuristic Search in the Web Graph},
author = {Ron Bekkerman and Shlomo Zilberstein and James Allan},
url = {http://rbr.cs.umass.edu/shlomo/papers/BZAijcai07.pdf},
year = {2007},
date = {2007-01-01},
booktitle = {Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI)},
pages = {2280--2285},
address = {Hyderabad, India},
abstract = {Effective representation of Web search results remains an open problem in the Information Retrieval community. For ambiguous queries, a traditional approach is to organize search results into groups (clusters), one for each meaning of the query. These groups are usually constructed according to the topical similarity of the retrieved documents, but it is possible for documents to be totally dissimilar and still correspond to the same meaning of the query. To overcome this problem, we exploit the thematic locality of the Web--relevant Web pages are often located close to each other in the Web graph of hyperlinks. We estimate the level of relevance between each pair of retrieved pages by the length of a path between them. The path is constructed using multi-agent beam search: each agent starts with one Web page and attempts to meet as many other agents as possible with some bounded resources. We test the system on two types of queries: ambiguous English words and people names. The Web appears to be tightly connected; about 70% of the agents meet with each other after only three iterations of exhaustive breadth-first search. However, when heuristics are applied, the search becomes more focused and the obtained results are substantially more accurate. Combined with a content-driven Web page clustering technique, our heuristic search system significantly improves the clustering results.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Effective representation of Web search results remains an open problem in the Information Retrieval community. For ambiguous queries, a traditional approach is to organize search results into groups (clusters), one for each meaning of the query. These groups are usually constructed according to the topical similarity of the retrieved documents, but it is possible for documents to be totally dissimilar and still correspond to the same meaning of the query. To overcome this problem, we exploit the thematic locality of the Web--relevant Web pages are often located close to each other in the Web graph of hyperlinks. We estimate the level of relevance between each pair of retrieved pages by the length of a path between them. The path is constructed using multi-agent beam search: each agent starts with one Web page and attempts to meet as many other agents as possible with some bounded resources. We test the system on two types of queries: ambiguous English words and people names. The Web appears to be tightly connected; about 70% of the agents meet with each other after only three iterations of exhaustive breadth-first search. However, when heuristics are applied, the search becomes more focused and the obtained results are substantially more accurate. Combined with a content-driven Web page clustering technique, our heuristic search system significantly improves the clustering results. |
Marek Petrik; Shlomo Zilberstein Average-Reward Decentralized Markov Decision Processes Conference Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI), Hyderabad, India, 2007. @conference{SZ:PZijcai07,
title = {Average-Reward Decentralized Markov Decision Processes},
author = {Marek Petrik and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/PZijcai07.pdf},
year = {2007},
date = {2007-01-01},
booktitle = {Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI)},
pages = {1997--2002},
address = {Hyderabad, India},
abstract = {Formal analysis of decentralized decision making has become a thriving research area in recent years, producing a number of multi-agent extensions of Markov decision processes. While much of the work has focused on optimizing discounted cumulative reward, optimizing average reward is sometimes a more suitable criterion. We formalize a class of such problems and analyze its characteristics, showing that it is NP complete and that optimal policies are deterministic. Our analysis lays the foundation for designing two optimal algorithms. Experimental results with a standard problem from the literature illustrate the applicability of these solution techniques.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Formal analysis of decentralized decision making has become a thriving research area in recent years, producing a number of multi-agent extensions of Markov decision processes. While much of the work has focused on optimizing discounted cumulative reward, optimizing average reward is sometimes a more suitable criterion. We formalize a class of such problems and analyze its characteristics, showing that it is NP complete and that optimal policies are deterministic. Our analysis lays the foundation for designing two optimal algorithms. Experimental results with a standard problem from the literature illustrate the applicability of these solution techniques. |
Sven Seuken; Shlomo Zilberstein Memory-Bounded Dynamic Programming for DEC-POMDPs Conference Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI), Hyderabad, India, 2007. @conference{SZ:SZijcai07,
title = {Memory-Bounded Dynamic Programming for DEC-POMDPs},
author = {Sven Seuken and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/SZijcai07.pdf},
year = {2007},
date = {2007-01-01},
booktitle = {Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI)},
pages = {2009--2015},
address = {Hyderabad, India},
abstract = {Decentralized decision making under uncertainty has been shown to be intractable when each agent has different partial information about the domain. Thus, improving the applicability and scalability of planning algorithms is an important challenge. We present the first memory-bounded dynamic programming algorithm for finite-horizon decentralized POMDPs. A set of heuristics is used to identify relevant points of the infinitely large belief space. Using these belief points, the algorithm successively selects the best joint policies for each horizon. The algorithm is extremely efficient, having linear time and space complexity with respect to the horizon length. Experimental results show that it can handle horizons that are multiple orders of magnitude larger than what was previously possible, while achieving the same or better solution quality. These results significantly increase the applicability of decentralized decision-making techniques.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Decentralized decision making under uncertainty has been shown to be intractable when each agent has different partial information about the domain. Thus, improving the applicability and scalability of planning algorithms is an important challenge. We present the first memory-bounded dynamic programming algorithm for finite-horizon decentralized POMDPs. A set of heuristics is used to identify relevant points of the infinitely large belief space. Using these belief points, the algorithm successively selects the best joint policies for each horizon. The algorithm is extremely efficient, having linear time and space complexity with respect to the horizon length. Experimental results show that it can handle horizons that are multiple orders of magnitude larger than what was previously possible, while achieving the same or better solution quality. These results significantly increase the applicability of decentralized decision-making techniques. |
Christopher Amato; Alan Carlin; Shlomo Zilberstein Bounded Dynamic Programming for Decentralized POMDPs Conference AAMAS Workshop on Multi-Agent Sequential Decision Making in Uncertain Domains (MSDM), Honolulu, Hawaii, 2007. @conference{SZ:ACZmsdm07,
title = {Bounded Dynamic Programming for Decentralized POMDPs},
author = {Christopher Amato and Alan Carlin and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/ACZmsdm07.pdf},
year = {2007},
date = {2007-01-01},
booktitle = {AAMAS Workshop on Multi-Agent Sequential Decision Making in Uncertain Domains (MSDM)},
address = {Honolulu, Hawaii},
abstract = {Solving decentralized POMDPs (DEC-POMDPs) optimally is a very hard problem. As a result, several approximate algorithms have been developed, but these do not have satisfactory error bounds. In this paper, we first discuss optimal dynamic programming and some approximate finite horizon DEC-POMDP algorithms. We then present a bounded dynamic programming algorithm. Given a problem and an error bound, the algorithm will return a solution within that bound when it is able to solve the problem. We give a proof of this bound and provide some experimental results showing high quality solutions to large DEC-POMDPs for large horizons.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Solving decentralized POMDPs (DEC-POMDPs) optimally is a very hard problem. As a result, several approximate algorithms have been developed, but these do not have satisfactory error bounds. In this paper, we first discuss optimal dynamic programming and some approximate finite horizon DEC-POMDP algorithms. We then present a bounded dynamic programming algorithm. Given a problem and an error bound, the algorithm will return a solution within that bound when it is able to solve the problem. We give a proof of this bound and provide some experimental results showing high quality solutions to large DEC-POMDPs for large horizons. |
Christopher Amato; Daniel S Bernstein; Shlomo Zilberstein Optimizing Memory-Bounded Controllers for Decentralized POMDPs Conference Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (UAI), Vancouver, British Columbia, 2007. @conference{SZ:ABZuai07,
title = {Optimizing Memory-Bounded Controllers for Decentralized POMDPs},
author = {Christopher Amato and Daniel S Bernstein and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/ABZuai07.pdf},
year = {2007},
date = {2007-01-01},
booktitle = {Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (UAI)},
pages = {1--8},
address = {Vancouver, British Columbia},
abstract = {We present a memory-bounded optimization approach for solving infinite-horizon decentralized POMDPs. Policies for each agent are represented by stochastic finite state controllers. We formulate the problem of optimizing these policies as a nonlinear program, leveraging powerful existing nonlinear optimization techniques for solving the problem. While existing solvers only guarantee locally optimal solutions, we show that our formulation produces higher quality controllers than the state-of-the-art approach. We also incorporate a shared source of randomness in the form of a correlation device to further increase solution quality with only a limited increase in space and time. Our experimental results show that nonlinear optimization can be used to provide high quality, concise solutions to decentralized decision problems under uncertainty.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
We present a memory-bounded optimization approach for solving infinite-horizon decentralized POMDPs. Policies for each agent are represented by stochastic finite state controllers. We formulate the problem of optimizing these policies as a nonlinear program, leveraging powerful existing nonlinear optimization techniques for solving the problem. While existing solvers only guarantee locally optimal solutions, we show that our formulation produces higher quality controllers than the state-of-the-art approach. We also incorporate a shared source of randomness in the form of a correlation device to further increase solution quality with only a limited increase in space and time. Our experimental results show that nonlinear optimization can be used to provide high quality, concise solutions to decentralized decision problems under uncertainty. |
Sven Seuken; Shlomo Zilberstein Improved Memory-Bounded Dynamic Programming for Decentralized POMDPs Conference Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (UAI), Vancouver, British Columbia, 2007. @conference{SZ:SZuai07,
title = {Improved Memory-Bounded Dynamic Programming for Decentralized POMDPs},
author = {Sven Seuken and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/SZuai07.pdf},
year = {2007},
date = {2007-01-01},
booktitle = {Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (UAI)},
pages = {344--351},
address = {Vancouver, British Columbia},
abstract = {Memory-Bounded Dynamic Programming (MBDP) has proved extremely effective in solving decentralized POMDPs with large horizons. We generalize the algorithm and improve its scalability by reducing the complexity with respect to the number of observations from exponential to polynomial. We derive error bounds on solution quality with respect to this new approximation and analyze the convergence behavior. To evaluate the effectiveness of the improvements, we introduce a new, larger benchmark problem. Experimental results show that despite the high complexity of decentralized POMDPs, scalable solution techniques such as MBDP perform surprisingly well.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Memory-Bounded Dynamic Programming (MBDP) has proved extremely effective in solving decentralized POMDPs with large horizons. We generalize the algorithm and improve its scalability by reducing the complexity with respect to the number of observations from exponential to polynomial. We derive error bounds on solution quality with respect to this new approximation and analyze the convergence behavior. To evaluate the effectiveness of the improvements, we introduce a new, larger benchmark problem. Experimental results show that despite the high complexity of decentralized POMDPs, scalable solution techniques such as MBDP perform surprisingly well. |
Martin Allen; Shlomo Zilberstein Agent Influence as a Predictor of Difficulty for Decentralized Problem-Solving Conference Proceedings of the 22nd Conference on Artificial Intelligence (AAAI), Vancouver, British Columbia, 2007. @conference{SZ:AZaaai07,
title = {Agent Influence as a Predictor of Difficulty for Decentralized Problem-Solving},
author = {Martin Allen and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/AZaaai07.pdf},
year = {2007},
date = {2007-01-01},
booktitle = {Proceedings of the 22nd Conference on Artificial Intelligence (AAAI)},
pages = {688--693},
address = {Vancouver, British Columbia},
abstract = {We study the effect of problem structure on the practical performance of optimal dynamic programming for decentralized decision problems. It is shown that restricting agent influence over problem dynamics can make the problem easier to solve. Experimental results establish that agent influence correlates with problem difficulty: as the gap between the influence of different agents grows, problems tend to become much easier to solve. The measure thus provides a general-purpose, automatic characterization of decentralized problems, identifying those for which optimal methods are more or less likely to work. Such a measure is also of possible use as a heuristic in the design of algorithms that create task decompositions and control hierarchies in order to simplify multiagent problems.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
We study the effect of problem structure on the practical performance of optimal dynamic programming for decentralized decision problems. It is shown that restricting agent influence over problem dynamics can make the problem easier to solve. Experimental results establish that agent influence correlates with problem difficulty: as the gap between the influence of different agents grows, problems tend to become much easier to solve. The measure thus provides a general-purpose, automatic characterization of decentralized problems, identifying those for which optimal methods are more or less likely to work. Such a measure is also of possible use as a heuristic in the design of algorithms that create task decompositions and control hierarchies in order to simplify multiagent problems. |
Marek Petrik; Shlomo Zilberstein Anytime Coordination Using Separable Bilinear Programs Conference Proceedings of the 22nd Conference on Artificial Intelligence (AAAI), Vancouver, British Columbia, 2007. @conference{SZ:PZaaai07,
title = {Anytime Coordination Using Separable Bilinear Programs},
author = {Marek Petrik and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/PZaaai07.pdf},
year = {2007},
date = {2007-01-01},
booktitle = {Proceedings of the 22nd Conference on Artificial Intelligence (AAAI)},
pages = {750--755},
address = {Vancouver, British Columbia},
abstract = {Developing scalable coordination algorithms for multi-agent systems is a hard computational challenge. One useful approach, demonstrated by the Coverage Set Algorithm (CSA), exploits structured interaction to produce significant computational gains. Empirically, CSA exhibits very good anytime performance, but an error bound on the results has not been established. We reformulate the algorithm and derive both online and offline error bounds for approximate solutions. Moreover, we propose an effective way to automatically reduce the complexity of the interaction. Our experiments show that this is a promising approach to solve a broad class of decentralized decision problems. The general formulation used by the algorithm makes it both easy to implement and widely applicable to a variety of other AI problems.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Developing scalable coordination algorithms for multi-agent systems is a hard computational challenge. One useful approach, demonstrated by the Coverage Set Algorithm (CSA), exploits structured interaction to produce significant computational gains. Empirically, CSA exhibits very good anytime performance, but an error bound on the results has not been established. We reformulate the algorithm and derive both online and offline error bounds for approximate solutions. Moreover, we propose an effective way to automatically reduce the complexity of the interaction. Our experiments show that this is a promising approach to solve a broad class of decentralized decision problems. The general formulation used by the algorithm makes it both easy to implement and widely applicable to a variety of other AI problems. |
Siddharth Srivastava; Neil Immerman; Shlomo Zilberstein Using Abstraction for Generalized Planning Conference ICAPS Workshop on Artificial Intelligence Planning and Learning (PAL), Providence, Rhode Island, 2007. @conference{SZ:SIZicaps07ws,
title = {Using Abstraction for Generalized Planning},
author = {Siddharth Srivastava and Neil Immerman and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/SIZicaps07ws.pdf},
year = {2007},
date = {2007-01-01},
booktitle = {ICAPS Workshop on Artificial Intelligence Planning and Learning (PAL)},
address = {Providence, Rhode Island},
abstract = {Given the complexity of planning, it is often beneficial to create plans that work for a wide class of problems. This facilitates reuse of existing plans for different instances of the same problem or even for other problems that are somehow similar. We present novel approaches for learning, and even finding such plans using state representation and abstraction techniques originally developed for static analysis of programs. The generalized plans that we compute include loops and work for a large class of problem scenarios having varying numbers of objects that must be manipulated to reach the goal. Our algorithm for learning generalized plans takes as its input an example plan for a certain problem instance and a finite 3-valued first-order structure representing a set of initial states from different problem instances. It learns a generalized plan along with a classification of the problem instances where it works. The algorithm for finding plans takes as input a similar 3-valued structure and a goal test. Its output is a set of generalized plans and conditions describing the problem instances for which they work.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Given the complexity of planning, it is often beneficial to create plans that work for a wide class of problems. This facilitates reuse of existing plans for different instances of the same problem or even for other problems that are somehow similar. We present novel approaches for learning, and even finding such plans using state representation and abstraction techniques originally developed for static analysis of programs. The generalized plans that we compute include loops and work for a large class of problem scenarios having varying numbers of objects that must be manipulated to reach the goal. Our algorithm for learning generalized plans takes as its input an example plan for a certain problem instance and a finite 3-valued first-order structure representing a set of initial states from different problem instances. It learns a generalized plan along with a classification of the problem instances where it works. The algorithm for finding plans takes as input a similar 3-valued structure and a goal test. Its output is a set of generalized plans and conditions describing the problem instances for which they work. |
2006
|
Shlomo Zilberstein (Ed.) Annals of Artificial Intelligence and Mathematics Special Issue: Selected Papers from the 9th International Symposium on Artificial Intelligence and Mathematics Book 2006. @book{SZ:Zamai06s,
title = {Annals of Artificial Intelligence and Mathematics Special Issue: Selected Papers from the 9th International Symposium on Artificial Intelligence and Mathematics},
editor = {Shlomo Zilberstein},
url = {https://link.springer.com/journal/10472/47/3},
doi = {https://doi.org/10.1007/s10472-006-9040-3},
year = {2006},
date = {2006-01-01},
volume = {47},
number = {3-4},
keywords = {},
pubstate = {published},
tppubtype = {book}
}
|
Marek Petrik; Shlomo Zilberstein Learning Parallel Portfolios of Algorithms Journal Article In: Annals of Mathematics and Artificial Intelligence (AMAI), vol. 48, no. 1-2, pp. 85–106, 2006. @article{SZ:PZamai06,
title = {Learning Parallel Portfolios of Algorithms},
author = {Marek Petrik and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/PZamai06.pdf},
doi = {10.1007/s10472-007-9050-9},
year = {2006},
date = {2006-01-01},
journal = {Annals of Mathematics and Artificial Intelligence (AMAI)},
volume = {48},
number = {1-2},
pages = {85--106},
abstract = {A wide range of combinatorial optimization algorithms have been developed for complex reasoning tasks. Frequently, no single algorithm outperforms all the others. This has raised interest in leveraging the performance of a collection of algorithms to improve performance. We show how to accomplish this using a Parallel Portfolio of Algorithms (PPA). A PPA is a collection of diverse algorithms for solving a single problem, all running concurrently on a single processor until a solution is produced. The performance of the portfolio may be controlled by assigning different shares of processor time to each algorithm. We present an effective method for finding a PPA in which the share of processor time allocated to each algorithm is fixed. Finding the optimal static schedule is shown to be an NP-complete problem for a general class of utility functions. We present bounds on the performance of the PPA over random instances and evaluate the performance empirically on a collection of 23 state-of-the-art SAT algorithms. The results show significant performance gains over the fastest individual algorithm in the collection.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
A wide range of combinatorial optimization algorithms have been developed for complex reasoning tasks. Frequently, no single algorithm outperforms all the others. This has raised interest in leveraging the performance of a collection of algorithms to improve performance. We show how to accomplish this using a Parallel Portfolio of Algorithms (PPA). A PPA is a collection of diverse algorithms for solving a single problem, all running concurrently on a single processor until a solution is produced. The performance of the portfolio may be controlled by assigning different shares of processor time to each algorithm. We present an effective method for finding a PPA in which the share of processor time allocated to each algorithm is fixed. Finding the optimal static schedule is shown to be an NP-complete problem for a general class of utility functions. We present bounds on the performance of the PPA over random instances and evaluate the performance empirically on a collection of 23 state-of-the-art SAT algorithms. The results show significant performance gains over the fastest individual algorithm in the collection. |
Christopher Amato; Daniel S Bernstein; Shlomo Zilberstein Solving POMDPs Using Quadratically Constrained Linear Programs Conference Proceedings of the 9th International Symposium on Artificial Intelligence and Mathematics (ISAIM), Ft. Lauderdale, Florida, 2006. @conference{SZ:ABZisaim06,
title = {Solving POMDPs Using Quadratically Constrained Linear Programs},
author = {Christopher Amato and Daniel S Bernstein and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/ABZisaim06.pdf},
year = {2006},
date = {2006-01-01},
booktitle = {Proceedings of the 9th International Symposium on Artificial Intelligence and Mathematics (ISAIM)},
address = {Ft. Lauderdale, Florida},
abstract = {Developing scalable algorithms for solving partially observable Markov decision processes (POMDPs) is an important challenge. One promising approach is based on representing POMDP policies as finite-state controllers. This method has been used successfully to address the intractable memory requirements of POMDP algorithms. We illustrate some fundamental theoretical limitations of existing techniques that use controllers. We then propose a new approach that formulates the problem as a quadratically constrained linear program (QCLP), the solution of which provides an optimal controller of a desired size. We evaluate several optimization methods for solving QCLPs and compare their performance with existing POMDP optimization methods. While the optimization algorithms used in this paper only guarantee locally optimal solutions, the results show consistent improvement of solution quality over the state-of-the-art techniques. The results show that powerful nonlinear programming algorithms can be used effectively to improve the performance and scalability of POMDP algorithms.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Developing scalable algorithms for solving partially observable Markov decision processes (POMDPs) is an important challenge. One promising approach is based on representing POMDP policies as finite-state controllers. This method has been used successfully to address the intractable memory requirements of POMDP algorithms. We illustrate some fundamental theoretical limitations of existing techniques that use controllers. We then propose a new approach that formulates the problem as a quadratically constrained linear program (QCLP), the solution of which provides an optimal controller of a desired size. We evaluate several optimization methods for solving QCLPs and compare their performance with existing POMDP optimization methods. While the optimization algorithms used in this paper only guarantee locally optimal solutions, the results show consistent improvement of solution quality over the state-of-the-art techniques. The results show that powerful nonlinear programming algorithms can be used effectively to improve the performance and scalability of POMDP algorithms. |
Marek Petrik; Shlomo Zilberstein Learning Static Parallel Portfolios of Algorithms Conference Proceedings of the 9th International Symposium on Artificial Intelligence and Mathematics (ISAIM), Ft. Lauderdale, Florida, 2006. @conference{SZ:PZisaim06,
title = {Learning Static Parallel Portfolios of Algorithms},
author = {Marek Petrik and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/PZisaim06.pdf},
year = {2006},
date = {2006-01-01},
booktitle = {Proceedings of the 9th International Symposium on Artificial Intelligence and Mathematics (ISAIM)},
address = {Ft. Lauderdale, Florida},
abstract = {We present an approach for improving the performance of combinatorial optimization algorithms by generating an optimal Parallel Portfolio of Algorithms (PPA). A PPA is a collection of diverse algorithms for solving a single problem, all running concurrently on a single processor until a solution is produced. The performance of the portfolio may be controlled by assigning different shares of processor time to each algorithm. We present a method for finding a static PPA, in which the share of processor time allocated to each algorithm is fixed. The schedule is shown to be optimal with respect to a given training set of instances. We draw bounds on the performance of the PPA over random instances and evaluate the performance empirically on a collection of 23 state-of-the-art SAT algorithms. The results show significant performance gains (up to a factor of 2) over the fastest individual algorithm in a realistic setting.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
We present an approach for improving the performance of combinatorial optimization algorithms by generating an optimal Parallel Portfolio of Algorithms (PPA). A PPA is a collection of diverse algorithms for solving a single problem, all running concurrently on a single processor until a solution is produced. The performance of the portfolio may be controlled by assigning different shares of processor time to each algorithm. We present a method for finding a static PPA, in which the share of processor time allocated to each algorithm is fixed. The schedule is shown to be optimal with respect to a given training set of instances. We draw bounds on the performance of the PPA over random instances and evaluate the performance empirically on a collection of 23 state-of-the-art SAT algorithms. The results show significant performance gains (up to a factor of 2) over the fastest individual algorithm in a realistic setting. |
Christopher Amato; Daniel S Bernstein; Shlomo Zilberstein Solving POMDPs Using Quadratically Constrained Linear Programs Conference Proceedings of the 5th International Joint Conference on Autonomous Agents and Multiagent Systems, Hakodate, Japan, 2006. @conference{SZ:ABZaamas06,
title = {Solving POMDPs Using Quadratically Constrained Linear Programs},
author = {Christopher Amato and Daniel S Bernstein and Shlomo Zilberstein},
url = {https://doi.org/10.1145/1160633.1160694},
doi = {10.1145/1160633.1160694},
year = {2006},
date = {2006-01-01},
booktitle = {Proceedings of the 5th International Joint Conference on Autonomous Agents and Multiagent Systems},
pages = {341--343},
address = {Hakodate, Japan},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
|
Christopher Amato; Daniel S Bernstein; Shlomo Zilberstein Optimal Fixed-Size Controllers for Decentralized POMDPs Conference AAMAS Workshop on Multi-Agent Sequential Decision Making in Uncertain Domains (MSDM), Hakodate, Japan, 2006. @conference{SZ:ABZmsdm06,
title = {Optimal Fixed-Size Controllers for Decentralized POMDPs},
author = {Christopher Amato and Daniel S Bernstein and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/ABZmsdm06.pdf},
year = {2006},
date = {2006-01-01},
booktitle = {AAMAS Workshop on Multi-Agent Sequential Decision Making in Uncertain Domains (MSDM)},
pages = {61--71},
address = {Hakodate, Japan},
abstract = {Solving decentralized partially observable Markov decision processes (DEC-POMDPs) is a difficult task. Exact solutions are intractable in all but the smallest problems and approximate solutions provide limited optimality guarantees. As a more principled alternative, we present a novel formulation of an optimal fixed-size solution of a DEC-POMDP as a nonlinear program. We discuss the benefits of this representation and evaluate several optimization methods. While the methods used in this paper only guarantee locally optimal solutions, a wide range of powerful nonlinear optimization techniques may now be applied to this problem. We show that by using our formulation in various domains, solution quality is higher than a current state-of-the-art approach. These results show that optimization can be used to provide high quality solutions to DEC-POMDPs while maintaining moderate memory and time usage.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Solving decentralized partially observable Markov decision processes (DEC-POMDPs) is a difficult task. Exact solutions are intractable in all but the smallest problems and approximate solutions provide limited optimality guarantees. As a more principled alternative, we present a novel formulation of an optimal fixed-size solution of a DEC-POMDP as a nonlinear program. We discuss the benefits of this representation and evaluate several optimization methods. While the methods used in this paper only guarantee locally optimal solutions, a wide range of powerful nonlinear optimization techniques may now be applied to this problem. We show that by using our formulation in various domains, solution quality is higher than a current state-of-the-art approach. These results show that optimization can be used to provide high quality solutions to DEC-POMDPs while maintaining moderate memory and time usage. |
2005
|
Martin Allen; Claudia V Goldman; Shlomo Zilberstein Learning to Communicate in Decentralized Systems Conference Proceedings of the 8th Biennial Israeli Symposium on the Foundations of AI, Haifa, Israel, 2005. @conference{SZ:AGZbisfai05,
title = {Learning to Communicate in Decentralized Systems},
author = {Martin Allen and Claudia V Goldman and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/AGZbisfai05.pdf},
year = {2005},
date = {2005-01-01},
booktitle = {Proceedings of the 8th Biennial Israeli Symposium on the Foundations of AI},
address = {Haifa, Israel},
abstract = {Learning to communicate is an emerging challenge in AI research. It is known that agents interacting in decentralized, stochastic environments can benefit from exchanging information. Multiagent planning generally assumes that agents share a common means of communication; however, in building robust distributed systems it is important to address potential miscoordination resulting from misinterpretation of messages exchanged. This paper lays foundations for studying this problem, examining its properties analytically and empirically in a decision-theoretic context. Solving the problem optimally is often intractable, but our approach enables agents using different languages to converge upon coordination over time.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Learning to communicate is an emerging challenge in AI research. It is known that agents interacting in decentralized, stochastic environments can benefit from exchanging information. Multiagent planning generally assumes that agents share a common means of communication; however, in building robust distributed systems it is important to address potential miscoordination resulting from misinterpretation of messages exchanged. This paper lays foundations for studying this problem, examining its properties analytically and empirically in a decision-theoretic context. Solving the problem optimally is often intractable, but our approach enables agents using different languages to converge upon coordination over time. |
Zhengzhu Feng; Shlomo Zilberstein Efficient Maximization in Solving POMDPs Conference Proceedings of the 20th National Conference on Artificial Intelligence (AAAI), Pittsburgh, Pennsylvania, 2005. @conference{SZ:FZaaai05,
title = {Efficient Maximization in Solving POMDPs},
author = {Zhengzhu Feng and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/FZaaai05.pdf},
year = {2005},
date = {2005-01-01},
booktitle = {Proceedings of the 20th National Conference on Artificial Intelligence (AAAI)},
pages = {975--980},
address = {Pittsburgh, Pennsylvania},
abstract = {We present a simple, yet effective improvement to the dynamic programming algorithm for solving partially observable Markov decision processes. The technique targets the vector pruning operation during the maximization step, a key source of complexity in POMDP algorithms. We identify two types of structures in the belief space and exploit them to reduce significantly the number of constraints in the linear programs used for pruning. The benefits of the new technique are evaluated both analytically and experimentally, showing that it can lead to significant performance improvement. The results open up new research opportunities to enhance the performance and scalability of several POMDP algorithms.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
We present a simple, yet effective improvement to the dynamic programming algorithm for solving partially observable Markov decision processes. The technique targets the vector pruning operation during the maximization step, a key source of complexity in POMDP algorithms. We identify two types of structures in the belief space and exploit them to reduce significantly the number of constraints in the linear programs used for pruning. The benefits of the new technique are evaluated both analytically and experimentally, showing that it can lead to significant performance improvement. The results open up new research opportunities to enhance the performance and scalability of several POMDP algorithms. |
Natalia N Beliaeva; Shlomo Zilberstein Generating Admissible Heuristics by Abstraction for Search in Stochastic Domains Conference Proceedings of the 6th Symposium on Abstraction, Reformulation, and Approximation (SARA), Airth Castle, Scotland, 2005. @conference{SZ:BZsara05,
title = {Generating Admissible Heuristics by Abstraction for Search in Stochastic Domains},
author = {Natalia N Beliaeva and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/BZsara05.pdf},
doi = {10.1007/11527862_2},
year = {2005},
date = {2005-01-01},
booktitle = {Proceedings of the 6th Symposium on Abstraction, Reformulation, and Approximation (SARA)},
pages = {14--29},
address = {Airth Castle, Scotland},
abstract = {Search in abstract spaces has been shown to produce useful admissible heuristic estimates in deterministic domains. We show in this paper how to generalize these results to search in stochastic domains. Solving stochastic optimization problems is significantly harder than solving their deterministic counterparts. Designing admissible heuristics for stochastic domains is also much harder. Therefore, deriving such heuristics automatically using abstraction is particularly beneficial. We analyze this approach both theoretically and empirically and show that it produces significant computational savings when used in conjunction with the heuristic search algorithm LAO*.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Search in abstract spaces has been shown to produce useful admissible heuristic estimates in deterministic domains. We show in this paper how to generalize these results to search in stochastic domains. Solving stochastic optimization problems is significantly harder than solving their deterministic counterparts. Designing admissible heuristics for stochastic domains is also much harder. Therefore, deriving such heuristics automatically using abstraction is particularly beneficial. We analyze this approach both theoretically and empirically and show that it produces significant computational savings when used in conjunction with the heuristic search algorithm LAO*. |
Daniel Szer; Francois Charpillet; Shlomo Zilberstein MAA*: A Heuristic Search Algorithm for Solving Decentralized POMDPs Conference Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence (UAI), Edinburgh, Scotland, 2005. @conference{SZ:SCZuai05,
title = {MAA*: A Heuristic Search Algorithm for Solving Decentralized POMDPs},
author = {Daniel Szer and Francois Charpillet and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/SCZuai05.pdf},
year = {2005},
date = {2005-01-01},
booktitle = {Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence (UAI)},
pages = {576--583},
address = {Edinburgh, Scotland},
abstract = {We present multi-agent A* (MAA*), the first complete and optimal heuristic search algorithm for solving decentralized partially-observable Markov decision problems (DEC- POMDPs) with finite horizon. The algorithm is suitable for computing optimal plans for a cooperative group of agents that operate in a stochastic environment such as multi-robot coordination, network traffic control, or distributed resource allocation. Solving such problems effectively is a major challenge in the area of planning under uncertainty. Our solution is based on a synthesis of classical heuristic search and decentralized control theory. Experimental results show that MAA* has significant advantages. We introduce an anytime variant of MAA* and conclude with a discussion of promising extensions such as an approach to solving infinite-horizon problems.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
We present multi-agent A* (MAA*), the first complete and optimal heuristic search algorithm for solving decentralized partially-observable Markov decision problems (DEC- POMDPs) with finite horizon. The algorithm is suitable for computing optimal plans for a cooperative group of agents that operate in a stochastic environment such as multi-robot coordination, network traffic control, or distributed resource allocation. Solving such problems effectively is a major challenge in the area of planning under uncertainty. Our solution is based on a synthesis of classical heuristic search and decentralized control theory. Experimental results show that MAA* has significant advantages. We introduce an anytime variant of MAA* and conclude with a discussion of promising extensions such as an approach to solving infinite-horizon problems. |
Martin Allen; Claudia V Goldman; Shlomo Zilberstein Language Learning in Multi-Agent Systems Conference Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI), Edinburgh, Scotland, 2005. @conference{SZ:AGZijcai05,
title = {Language Learning in Multi-Agent Systems},
author = {Martin Allen and Claudia V Goldman and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/AGZijcai05.pdf},
year = {2005},
date = {2005-01-01},
booktitle = {Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI)},
pages = {1649--1650},
address = {Edinburgh, Scotland},
abstract = {We present the problem of learning to communicate in decentralized and stochastic environments, analyzing it formally in a decision-theoretic context and illustrating the concept experimentally. Our approach allows agents to converge upon coordinated communication and action over time.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
We present the problem of learning to communicate in decentralized and stochastic environments, analyzing it formally in a decision-theoretic context and illustrating the concept experimentally. Our approach allows agents to converge upon coordinated communication and action over time. |
Daniel S Bernstein; Eric A Hansen; Shlomo Zilberstein Bounded Policy Iteration for Decentralized POMDPs Conference Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI), Edinburgh, Scotland, 2005. @conference{SZ:BHZijcai05,
title = {Bounded Policy Iteration for Decentralized POMDPs},
author = {Daniel S Bernstein and Eric A Hansen and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/BHZijcai05.pdf},
year = {2005},
date = {2005-01-01},
booktitle = {Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI)},
pages = {1287--1292},
address = {Edinburgh, Scotland},
abstract = {We present a bounded policy iteration algorithm for infinite-horizon decentralized POMDPs. Policies are represented as joint stochastic finite-state controllers, which consist of a local controller for each agent. We also let a joint controller include a correlation device that allows the agents to correlate their behavior without exchanging information during execution, and show that this leads to improved performance. The algorithm uses a fixed amount of memory, and each iteration is guaranteed to produce a controller with value at least as high as the previous one for all possible initial state distributions. For the case of a single agent, the algorithm reduces to Poupart and Boutilier's bounded policy iteration for POMDPs.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
We present a bounded policy iteration algorithm for infinite-horizon decentralized POMDPs. Policies are represented as joint stochastic finite-state controllers, which consist of a local controller for each agent. We also let a joint controller include a correlation device that allows the agents to correlate their behavior without exchanging information during execution, and show that this leads to improved performance. The algorithm uses a fixed amount of memory, and each iteration is guaranteed to produce a controller with value at least as high as the previous one for all possible initial state distributions. For the case of a single agent, the algorithm reduces to Poupart and Boutilier's bounded policy iteration for POMDPs. |
Raphen Becker; Victor Lesser; Shlomo Zilberstein Analyzing Myopic Approaches for Multi-Agent Communication Conference Proceedings of the IEEE/WIC/ACM International Conference on Intelligent Agent Technology, Compiegne, France, 2005, (Best Paper Award). @conference{SZ:BLZiat05,
title = {Analyzing Myopic Approaches for Multi-Agent Communication},
author = {Raphen Becker and Victor Lesser and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/BLZiat05.pdf},
doi = {10.1109/IAT.2005.44},
year = {2005},
date = {2005-01-01},
booktitle = {Proceedings of the IEEE/WIC/ACM International Conference on Intelligent Agent Technology},
pages = {550--557},
address = {Compiegne, France},
abstract = {Choosing when to communicate is a fundamental problem in multi-agent systems. This problem becomes particularly hard when communication is constrained and each agent has different partial information about the overall situation. Although computing the exact value of communication is intractable, it has been estimated using a standard myopic assumption. However, this assumption--that communication is only possible at the present time--introduces error that can lead to poor agent behavior. We examine specific situations in which the myopic approach performs poorly and demonstrate an alternate approach that relaxes the assumption to improve the performance. The results provide an effective method for value-driven communication policies in multi-agent systems.},
note = {Best Paper Award},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Choosing when to communicate is a fundamental problem in multi-agent systems. This problem becomes particularly hard when communication is constrained and each agent has different partial information about the overall situation. Although computing the exact value of communication is intractable, it has been estimated using a standard myopic assumption. However, this assumption--that communication is only possible at the present time--introduces error that can lead to poor agent behavior. We examine specific situations in which the myopic approach performs poorly and demonstrate an alternate approach that relaxes the assumption to improve the performance. The results provide an effective method for value-driven communication policies in multi-agent systems. |
Andrew Arnt; Shlomo Zilberstein Learning Policies for Sequential Time and Cost Sensitive Classification Conference KDD Workshop on Utility-Based Data Mining, Chicago, Illinois, 2005. @conference{SZ:AZubdm05,
title = {Learning Policies for Sequential Time and Cost Sensitive Classification},
author = {Andrew Arnt and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/AZubdm05.pdf},
year = {2005},
date = {2005-01-01},
booktitle = {KDD Workshop on Utility-Based Data Mining},
address = {Chicago, Illinois},
abstract = {In time and cost sensitive classification, the value of a labeled instance depends not only on the correctness of the labeling, but also the timeliness with which the instance is labeled. Instance attributes are initially unknown, and may take significant time to measure. This results in a difficult problem, trying to manage the tradeoff between time and accuracy. The problem is further complicated when we consider the classification of a sequence of time-sensitive classification tasks, where time spent measuring attributes in one instance can adversely effect the costs of future instances. We solve these problems using a decision theoretic approach. The problem is modeled as an MDP with a potentially very large state space. We discuss how to intelligently discretize time and approximate the effects of measurement actions in the current task given all waiting tasks. The results offer an effective approach to attribute measurement and classification for a variety of time sensitive applications.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
In time and cost sensitive classification, the value of a labeled instance depends not only on the correctness of the labeling, but also the timeliness with which the instance is labeled. Instance attributes are initially unknown, and may take significant time to measure. This results in a difficult problem, trying to manage the tradeoff between time and accuracy. The problem is further complicated when we consider the classification of a sequence of time-sensitive classification tasks, where time spent measuring attributes in one instance can adversely effect the costs of future instances. We solve these problems using a decision theoretic approach. The problem is modeled as an MDP with a potentially very large state space. We discuss how to intelligently discretize time and approximate the effects of measurement actions in the current task given all waiting tasks. The results offer an effective approach to attribute measurement and classification for a variety of time sensitive applications. |
2004
|
Shlomo Zilberstein; Jana Koehler; Sven Koenig (Ed.) Proceedings of the 14th International Conference on Automated Planning and Scheduling Proceedings AAAI, Whistler, British Columbia, Canada, 2004, ISBN: 1-57735-200-9. @proceedings{SZ:ZKKicaps04,
title = {Proceedings of the 14th International Conference on Automated Planning and Scheduling},
editor = {Shlomo Zilberstein and Jana Koehler and Sven Koenig},
url = {http://www.aaai.org/Library/ICAPS/icaps04contents.php},
isbn = {1-57735-200-9},
year = {2004},
date = {2004-01-01},
publisher = {AAAI},
address = {Whistler, British Columbia, Canada},
keywords = {},
pubstate = {published},
tppubtype = {proceedings}
}
|
Raphen Becker; Shlomo Zilberstein; Victor Lesser; Claudia V Goldman Solving Transition Independent Decentralized Markov Decision Processes Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 22, pp. 423–455, 2004. @article{SZ:BZLGjair04,
title = {Solving Transition Independent Decentralized Markov Decision Processes},
author = {Raphen Becker and Shlomo Zilberstein and Victor Lesser and Claudia V Goldman},
url = {http://rbr.cs.umass.edu/shlomo/papers/BZLGjair04.pdf},
doi = {https://doi.org/10.1613/jair.1497},
year = {2004},
date = {2004-01-01},
journal = {Journal of Artificial Intelligence Research (JAIR)},
volume = {22},
pages = {423--455},
abstract = {Formal treatment of collaborative multi-agent systems has been lagging behind the rapid progress in sequential decision making by individual agents. Recent work in the area of decentralized Markov Decision Processes (MDPs) has contributed to closing this gap, but the computational complexity of these models remains a serious obstacle. To overcome this complexity barrier, we identify a specific class of decentralized MDPs in which the agents' transitions are independent. The class consists of independent collaborating agents that are tied together through a structured global reward function that depends on all of their histories of states and actions. We present a novel algorithm for solving this class of problems and examine its properties, both as an optimal algorithm and as an anytime algorithm. To the best of our knowledge, this is the first algorithm to optimally solve a non-trivial subclass of decentralized MDPs. It lays the foundation for further work in this area on both exact and approximate algorithms.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Formal treatment of collaborative multi-agent systems has been lagging behind the rapid progress in sequential decision making by individual agents. Recent work in the area of decentralized Markov Decision Processes (MDPs) has contributed to closing this gap, but the computational complexity of these models remains a serious obstacle. To overcome this complexity barrier, we identify a specific class of decentralized MDPs in which the agents' transitions are independent. The class consists of independent collaborating agents that are tied together through a structured global reward function that depends on all of their histories of states and actions. We present a novel algorithm for solving this class of problems and examine its properties, both as an optimal algorithm and as an anytime algorithm. To the best of our knowledge, this is the first algorithm to optimally solve a non-trivial subclass of decentralized MDPs. It lays the foundation for further work in this area on both exact and approximate algorithms. |
Claudia V Goldman; Shlomo Zilberstein Decentralized Control of Cooperative Systems: Categorization and Complexity Analysis Journal Article In: Journal of Artificial Intelligence Research (JAIR), vol. 22, pp. 143–174, 2004. @article{SZ:GZjair04,
title = {Decentralized Control of Cooperative Systems: Categorization and Complexity Analysis},
author = {Claudia V Goldman and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/GZjair04.pdf},
doi = {https://doi.org/10.1613/jair.1427},
year = {2004},
date = {2004-01-01},
journal = {Journal of Artificial Intelligence Research (JAIR)},
volume = {22},
pages = {143--174},
abstract = {Decentralized control of cooperative systems captures the operation of a group of decision-makers that share a single global objective. The difficulty in solving optimally such problems arises when the agents lack full observability of the global state of the system when they operate. The general problem has been shown to be NEXP-complete. In this paper, we identify classes of decentralized control problems whose complexity ranges between NEXP and P. In particular, we study problems characterized by independent transitions, independent observations, and goal-oriented objective functions. Two algorithms are shown to solve optimally useful classes of goal-oriented decentralized processes in polynomial time. This paper also studies information sharing among the decision-makers, which can improve their performance. We distinguish between three ways in which agents can exchange information: indirect communication, direct communication and sharing state features that are not controlled by the agents. Our analysis shows that for every class of problems we consider, introducing direct or indirect communication does not change the worst-case complexity. The results provide a better understanding of the complexity of decentralized control problems that arise in practice and facilitate the development of planning algorithms for these problems.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Decentralized control of cooperative systems captures the operation of a group of decision-makers that share a single global objective. The difficulty in solving optimally such problems arises when the agents lack full observability of the global state of the system when they operate. The general problem has been shown to be NEXP-complete. In this paper, we identify classes of decentralized control problems whose complexity ranges between NEXP and P. In particular, we study problems characterized by independent transitions, independent observations, and goal-oriented objective functions. Two algorithms are shown to solve optimally useful classes of goal-oriented decentralized processes in polynomial time. This paper also studies information sharing among the decision-makers, which can improve their performance. We distinguish between three ways in which agents can exchange information: indirect communication, direct communication and sharing state features that are not controlled by the agents. Our analysis shows that for every class of problems we consider, introducing direct or indirect communication does not change the worst-case complexity. The results provide a better understanding of the complexity of decentralized control problems that arise in practice and facilitate the development of planning algorithms for these problems. |
Andrew Arnt; Shlomo Zilberstein; James Allan; Abdel-Illah Mouaddib Dynamic Composition of Information Retrieval Techniques Journal Article In: Journal of Intelligent Information Systems (JIIS), vol. 23, no. 1, pp. 67–97, 2004. @article{SZ:AZAMjiis04,
title = {Dynamic Composition of Information Retrieval Techniques},
author = {Andrew Arnt and Shlomo Zilberstein and James Allan and Abdel-Illah Mouaddib},
url = {http://rbr.cs.umass.edu/shlomo/papers/AZAMjiis04.pdf},
doi = {10.1023/B:JIIS.0000029671.27333.7d},
year = {2004},
date = {2004-01-01},
journal = {Journal of Intelligent Information Systems (JIIS)},
volume = {23},
number = {1},
pages = {67--97},
abstract = {This paper presents a new approach to information retrieval (IR) based on run-time selection of the best set of techniques to respond to a given query. A technique is selected based on its projected effectiveness with respect to the specific query, the load on the system, and a time-dependent utility function. The paper examines two fundamental questions: (1) can the selection of the best IR techniques be performed at run-time with minimal computational overhead? and (2) is it possible to construct a reliable probabilistic model of the performance of an IR technique that is conditioned on the characteristics of the query? We show that both of these questions can be answered positively. These results suggest a new system design that carries a great potential to improve the quality of service of future IR systems.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
This paper presents a new approach to information retrieval (IR) based on run-time selection of the best set of techniques to respond to a given query. A technique is selected based on its projected effectiveness with respect to the specific query, the load on the system, and a time-dependent utility function. The paper examines two fundamental questions: (1) can the selection of the best IR techniques be performed at run-time with minimal computational overhead? and (2) is it possible to construct a reliable probabilistic model of the performance of an IR technique that is conditioned on the characteristics of the query? We show that both of these questions can be answered positively. These results suggest a new system design that carries a great potential to improve the quality of service of future IR systems. |
Daniel S Bernstein; Eric A Hansen; Shlomo Zilberstein; Christopher Amato Dynamic Programming for Decentralized POMDPs Conference AAAI Spring Symposium on Bridging the Multi-Agent and Multi-Robot Research Gap, Stanford, California, 2004. @conference{SZ:HBZspring04,
title = {Dynamic Programming for Decentralized POMDPs},
author = {Daniel S Bernstein and Eric A Hansen and Shlomo Zilberstein and Christopher Amato},
year = {2004},
date = {2004-01-01},
booktitle = {AAAI Spring Symposium on Bridging the Multi-Agent and Multi-Robot Research Gap},
address = {Stanford, California},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
|
Martin Allen; Shlomo Zilberstein Automated Conversion and Simplification of Plan Representations Conference ICAPS Workshop on Connecting Planning Theory with Practice, Whistler, British Columbia, 2004. @conference{SZ:AZicaps04ws,
title = {Automated Conversion and Simplification of Plan Representations},
author = {Martin Allen and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/AZicaps04ws.pdf},
year = {2004},
date = {2004-01-01},
booktitle = {ICAPS Workshop on Connecting Planning Theory with Practice},
address = {Whistler, British Columbia},
abstract = {As planning agents grow more sophisticated, issues of plan representation arise alongside concerns with plan generation. Planning methods work over increasingly large and difficult problems and resulting plans are often complex or unwieldy. Further, where planners must interact with human beings--either for purposes of plan verification and analysis, or in mixed-initiative plan-generation settings--plans must be represented so that the intended course of action is readily visible. We propose automated techniques for the simplification of plans, and for conversion between distinct plan representations; our proposal is illustrated by examples from our recent research, concerning conversion between large-scale MDP solutions and graph-based contingency plans.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
As planning agents grow more sophisticated, issues of plan representation arise alongside concerns with plan generation. Planning methods work over increasingly large and difficult problems and resulting plans are often complex or unwieldy. Further, where planners must interact with human beings--either for purposes of plan verification and analysis, or in mixed-initiative plan-generation settings--plans must be represented so that the intended course of action is readily visible. We propose automated techniques for the simplification of plans, and for conversion between distinct plan representations; our proposal is illustrated by examples from our recent research, concerning conversion between large-scale MDP solutions and graph-based contingency plans. |
Raphen Becker; Shlomo Zilberstein; Victor Lesser Decentralized Markov Decision Processes with Event-Driven Interactions Conference Proceedings of the 3rd International Conference on Autonomous Agents and Multi Agent Systems (AAMAS), New York, NY, 2004. @conference{SZ:BZLaamas04,
title = {Decentralized Markov Decision Processes with Event-Driven Interactions},
author = {Raphen Becker and Shlomo Zilberstein and Victor Lesser},
url = {http://rbr.cs.umass.edu/shlomo/papers/BZLaamas04.pdf},
year = {2004},
date = {2004-01-01},
booktitle = {Proceedings of the 3rd International Conference on Autonomous Agents and Multi Agent Systems (AAMAS)},
pages = {302--309},
address = {New York, NY},
abstract = {Decentralized MDPs provide a powerful formal framework for planning in multi-agent systems, but the complexity of the model limits its usefulness. We study in this paper a class of DEC-MDPs that restricts the interactions between the agents to a structured, event-driven dependency. These dependencies can model locking a shared resource or temporal enabling constraints, both of which arise frequently in practice. The complexity of this class of problems is shown to be no harder than exponential in the number of states and doubly exponential in the number of dependencies. Since the number of dependencies is much smaller than the number of states for many problems, this is significantly better than the doubly exponential (in the state space) complexity of DEC-MDPs. We also demonstrate how an algorithm we previously developed can be used to solve problems in this class both optimally and approximately. Experimental work indicates that this solution technique is significantly faster than a naive policy search approach.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Decentralized MDPs provide a powerful formal framework for planning in multi-agent systems, but the complexity of the model limits its usefulness. We study in this paper a class of DEC-MDPs that restricts the interactions between the agents to a structured, event-driven dependency. These dependencies can model locking a shared resource or temporal enabling constraints, both of which arise frequently in practice. The complexity of this class of problems is shown to be no harder than exponential in the number of states and doubly exponential in the number of dependencies. Since the number of dependencies is much smaller than the number of states for many problems, this is significantly better than the doubly exponential (in the state space) complexity of DEC-MDPs. We also demonstrate how an algorithm we previously developed can be used to solve problems in this class both optimally and approximately. Experimental work indicates that this solution technique is significantly faster than a naive policy search approach. |
Claudia V Goldman; Martin Allen; Shlomo Zilberstein Decentralized Language Learning Through Acting Conference Proceedings of the 3rd International Conference on Autonomous Agents and Multi Agent Systems (AAMAS), New York, NY, 2004. @conference{SZ:GAZaamas04,
title = {Decentralized Language Learning Through Acting},
author = {Claudia V Goldman and Martin Allen and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/GAZaamas04.pdf},
year = {2004},
date = {2004-01-01},
booktitle = {Proceedings of the 3rd International Conference on Autonomous Agents and Multi Agent Systems (AAMAS)},
pages = {1006--1013},
address = {New York, NY},
abstract = {This paper presents an algorithm for learning the meaning of messages communicated between agents that interact while acting optimally towards a cooperative goal. Our reinforcement-learning method is based on Bayesian filtering and has been adapted for a decentralized control process. Empirical results shed light on the complexity of the learning problem, and on factors affecting the speed of convergence. Designing intelligent agents able to adapt their mutual interpretation of messages exchanged, in order to improve overall task-oriented performance, introduces an essential cognitive capability that can upgrade the current state of the art in multi-agent and human-machine systems to the next level. Learning to communicate while acting will add to the robustness and flexibility of these systems and hence to a more efficient and productive performance.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
This paper presents an algorithm for learning the meaning of messages communicated between agents that interact while acting optimally towards a cooperative goal. Our reinforcement-learning method is based on Bayesian filtering and has been adapted for a decentralized control process. Empirical results shed light on the complexity of the learning problem, and on factors affecting the speed of convergence. Designing intelligent agents able to adapt their mutual interpretation of messages exchanged, in order to improve overall task-oriented performance, introduces an essential cognitive capability that can upgrade the current state of the art in multi-agent and human-machine systems to the next level. Learning to communicate while acting will add to the robustness and flexibility of these systems and hence to a more efficient and productive performance. |
Eric A Hansen; Daniel S Bernstein; Shlomo Zilberstein Dynamic Programming for Partially Observable Stochastic Games Conference Proceedings of the 19th National Conference on Artificial Intelligence (AAAI), San Jose, California, 2004. @conference{SZ:HBZaaai04,
title = {Dynamic Programming for Partially Observable Stochastic Games},
author = {Eric A Hansen and Daniel S Bernstein and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/HBZaaai04.pdf},
year = {2004},
date = {2004-01-01},
booktitle = {Proceedings of the 19th National Conference on Artificial Intelligence (AAAI)},
pages = {709--715},
address = {San Jose, California},
abstract = {We develop an exact dynamic programming algorithm for partially observable stochastic games (POSGs). The algorithm is a synthesis of dynamic programming for partially observable Markov decision processes (POMDPs) and iterated elimination of dominated strategies in normal form games. We prove that when applied to finite-horizon POSGs, the algorithm iteratively eliminates very weakly dominated strategies without first forming a normal form representation of the game. For the special case in which agents share the same payoffs, the algorithm can be used to find an optimal solution. We present preliminary empirical results and discuss ways to further exploit POMDP theory in solving POSGs.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
We develop an exact dynamic programming algorithm for partially observable stochastic games (POSGs). The algorithm is a synthesis of dynamic programming for partially observable Markov decision processes (POMDPs) and iterated elimination of dominated strategies in normal form games. We prove that when applied to finite-horizon POSGs, the algorithm iteratively eliminates very weakly dominated strategies without first forming a normal form representation of the game. For the special case in which agents share the same payoffs, the algorithm can be used to find an optimal solution. We present preliminary empirical results and discuss ways to further exploit POMDP theory in solving POSGs. |
Zhengzhu Feng; Shlomo Zilberstein Region-Based Incremental Pruning for POMDPs Conference Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence (UAI), Banff, Canada, 2004. @conference{SZ:FZuai04,
title = {Region-Based Incremental Pruning for POMDPs},
author = {Zhengzhu Feng and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/FZuai04.pdf},
year = {2004},
date = {2004-01-01},
booktitle = {Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence (UAI)},
pages = {146--153},
address = {Banff, Canada},
abstract = {We present a major improvement to the incremental pruning algorithm for solving partially observable Markov decision processes. Our technique targets the cross-sum step of the dynamic programming (DP) update, a key source of complexity in POMDP algorithms. Instead of reasoning about the whole belief space when pruning the cross-sums, our algorithm divides the belief space into smaller regions and performs independent pruning in each region. We evaluate the benefits of the new technique both analytically and experimentally, and show that it produces very significant performance gains. The results contribute to the scalability of POMDP algorithms to domains that cannot be handled by the best existing techniques.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
We present a major improvement to the incremental pruning algorithm for solving partially observable Markov decision processes. Our technique targets the cross-sum step of the dynamic programming (DP) update, a key source of complexity in POMDP algorithms. Instead of reasoning about the whole belief space when pruning the cross-sums, our algorithm divides the belief space into smaller regions and performs independent pruning in each region. We evaluate the benefits of the new technique both analytically and experimentally, and show that it produces very significant performance gains. The results contribute to the scalability of POMDP algorithms to domains that cannot be handled by the best existing techniques. |
Laurent Jeanpierre; Shlomo Zilberstein; Francois Charpillet Optimal Decision with Continuous Actions Conference Jurnées Nationales sur Processus Décisionnel de Markov et Intelligence Artificielle, Paris, France, 2004. @conference{SZ:JZCjournee04,
title = {Optimal Decision with Continuous Actions},
author = {Laurent Jeanpierre and Shlomo Zilberstein and Francois Charpillet},
url = {https://hal.inria.fr/inria-00099977},
year = {2004},
date = {2004-01-01},
booktitle = {Jurnées Nationales sur Processus Décisionnel de Markov et Intelligence Artificielle},
address = {Paris, France},
abstract = {In this article, we show an original method for solving decision problems with continuous actions. From a deterministic modeling of the problem with non-linear differential equations, we compute the value function based on an approximation by finite elements, which is known to converge to the optimal value. The elements to add are chosen by carefully solving the formal system of equations so that the optimal value could be attained with as few elements as possible.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
In this article, we show an original method for solving decision problems with continuous actions. From a deterministic modeling of the problem with non-linear differential equations, we compute the value function based on an approximation by finite elements, which is known to converge to the optimal value. The elements to add are chosen by carefully solving the formal system of equations so that the optimal value could be attained with as few elements as possible. |
Andrew Arnt; Shlomo Zilberstein Attribute Measurement Policies for Cost-Effective Classification Conference SIAM/SDM Workshop on Data Mining in Resource Constrained Environments, Lake Buena Vista, Florida, 2004. @conference{SZ:AZsdm04,
title = {Attribute Measurement Policies for Cost-Effective Classification},
author = {Andrew Arnt and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/AZsdm04.pdf},
year = {2004},
date = {2004-01-01},
booktitle = {SIAM/SDM Workshop on Data Mining in Resource Constrained Environments},
address = {Lake Buena Vista, Florida},
abstract = {Many systems with machine learning classifiers as components require the ability to function in realtime, online settings. Such systems must be able to quickly classify instances so as to minimize a variety of costs. We identify three components of cost that must be considered: penalties incurred due to the misclassification of an instance, costs incurred when measuring an attribute of the instance, and a utility cost related to the time elapsed while measuring attributes. We show how to model this problem as a Markov Decision Process (MDP), and then use AO* heuristic search to build a policy given a set of labeled training data. Additionally, we discuss how to modify this system to cope with a stream of instances arriving over time, where time taken to measure attributes in the current instance can influence time-sensitive costs of waiting instances.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Many systems with machine learning classifiers as components require the ability to function in realtime, online settings. Such systems must be able to quickly classify instances so as to minimize a variety of costs. We identify three components of cost that must be considered: penalties incurred due to the misclassification of an instance, costs incurred when measuring an attribute of the instance, and a utility cost related to the time elapsed while measuring attributes. We show how to model this problem as a Markov Decision Process (MDP), and then use AO* heuristic search to build a policy given a set of labeled training data. Additionally, we discuss how to modify this system to cope with a stream of instances arriving over time, where time taken to measure attributes in the current instance can influence time-sensitive costs of waiting instances. |
Jianbin Tan; George S Avrunin; Lori A Clarke; Shlomo Zilberstein; Stefan Leue Heuristic-Guided Counterexample Search in FLAVERS Conference Proceedings of the 12th ACM SIGSOFT International Symposium on Foundations of Software Engineering, Newport Beach, California, 2004. @conference{SZ:TACZLfse04,
title = {Heuristic-Guided Counterexample Search in FLAVERS},
author = {Jianbin Tan and George S Avrunin and Lori A Clarke and Shlomo Zilberstein and Stefan Leue},
url = {http://rbr.cs.umass.edu/shlomo/papers/TACZLfse04.pdf},
doi = {10.1145/1029894.1029922},
year = {2004},
date = {2004-01-01},
booktitle = {Proceedings of the 12th ACM SIGSOFT International Symposium on Foundations of Software Engineering},
pages = {201--210},
address = {Newport Beach, California},
abstract = {One of the benefits of finite-state verification (FSV) tools, such as model checkers, is that a counterexample is provided when the property cannot be verified. Not all counterexamples, however, are equally useful to the analysts trying to understand and localize the fault. Often counterexamples are so long that they are hard to understand. Thus, it is important for FSV tools to find short counterexamples and to do so quickly. Commonly used search strategies, such as breadth-first and depth-first search, do not usually perform well in both of these dimensions. In this paper, we investigate heuristic-guided search strategies for the FSV tool FLAVERS and propose a novel two-stage counterexample search strategy. We describe an experiment showing that this two-stage strategy, when combined with appropriate heuristics, is extremely effective at quickly finding short counterexamples for a large set of verification problems.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
One of the benefits of finite-state verification (FSV) tools, such as model checkers, is that a counterexample is provided when the property cannot be verified. Not all counterexamples, however, are equally useful to the analysts trying to understand and localize the fault. Often counterexamples are so long that they are hard to understand. Thus, it is important for FSV tools to find short counterexamples and to do so quickly. Commonly used search strategies, such as breadth-first and depth-first search, do not usually perform well in both of these dimensions. In this paper, we investigate heuristic-guided search strategies for the FSV tool FLAVERS and propose a novel two-stage counterexample search strategy. We describe an experiment showing that this two-stage strategy, when combined with appropriate heuristics, is extremely effective at quickly finding short counterexamples for a large set of verification problems. |
Andrew Arnt; Shlomo Zilberstein Learning Policies for Sequential Time and Cost Sensitive Classification Conference Proceedings of the 4th IEEE International Conference on Data Mining, Brighton, UK, 2004. @conference{SZ:AZicdm04,
title = {Learning Policies for Sequential Time and Cost Sensitive Classification},
author = {Andrew Arnt and Shlomo Zilberstein},
url = {https://doi.org/10.1109/ICDM.2004.10051},
doi = {10.1109/ICDM.2004.10051},
year = {2004},
date = {2004-01-01},
booktitle = {Proceedings of the 4th IEEE International Conference on Data Mining},
pages = {323--326},
address = {Brighton, UK},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
|
Shlomo Zilberstein; Jana Koehler; Sven Koenig The Fourteenth International Conference on Automated Planning and Scheduling (ICAPS-04) Journal Article In: AI Magazine, vol. 25, no. 4, pp. 101–104, 2004. @article{SZ:ZKKaim04,
title = {The Fourteenth International Conference on Automated Planning and Scheduling (ICAPS-04)},
author = {Shlomo Zilberstein and Jana Koehler and Sven Koenig},
url = {http://www.aaai.org/ojs/index.php/aimagazine/article/view/1789},
year = {2004},
date = {2004-01-01},
journal = {AI Magazine},
volume = {25},
number = {4},
pages = {101--104},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
|
2003
|
Shlomo Zilberstein; Francois Charpillet; Philippe Chassaing Optimal Sequencing of Contract Algorithms Journal Article In: Annals of Mathematics and Artificial Intelligence (AMAI), vol. 39, no. 1-2, pp. 1-18, 2003. @article{SZ:ZCCamai03,
title = {Optimal Sequencing of Contract Algorithms},
author = {Shlomo Zilberstein and Francois Charpillet and Philippe Chassaing},
url = {http://rbr.cs.umass.edu/shlomo/papers/ZCCamai03.pdf},
doi = {10.1023/A:1024412831598},
year = {2003},
date = {2003-01-01},
journal = {Annals of Mathematics and Artificial Intelligence (AMAI)},
volume = {39},
number = {1-2},
pages = {1-18},
abstract = {We address the problem of building an interruptible real-time system using non-interruptible components. Some artificial intelligence techniques offer a tradeoff between computation time and quality of results, but their run-time must be determined when they are activated. These techniques, called contract algorithms, introduce a complex scheduling problem when there is uncertainty about the amount of time available for problem-solving. We show how to optimally sequence contract algorithms to create the best possible interruptible system with or without stochastic information about the deadline. These results extend the foundation of real-time problem-solving and provide useful guidance for embedding contract algorithms in applications.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
We address the problem of building an interruptible real-time system using non-interruptible components. Some artificial intelligence techniques offer a tradeoff between computation time and quality of results, but their run-time must be determined when they are activated. These techniques, called contract algorithms, introduce a complex scheduling problem when there is uncertainty about the amount of time available for problem-solving. We show how to optimally sequence contract algorithms to create the best possible interruptible system with or without stochastic information about the deadline. These results extend the foundation of real-time problem-solving and provide useful guidance for embedding contract algorithms in applications. |
Mauricio Marengoni; Allen Hanson; Shlomo Zilberstein; Edward Riseman Decision Making and Uncertainty Management in a 3D Reconstruction System Journal Article In: IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 7, pp. 852–858, 2003. @article{SZ:MHZRpami03,
title = {Decision Making and Uncertainty Management in a 3D Reconstruction System},
author = {Mauricio Marengoni and Allen Hanson and Shlomo Zilberstein and Edward Riseman},
url = {https://doi.org/10.1109/TPAMI.2003.1206514},
doi = {10.1109/TPAMI.2003.1206514},
year = {2003},
date = {2003-01-01},
journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
volume = {25},
number = {7},
pages = {852--858},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
|
Daniel S Bernstein; Zhengzhu Feng; Brian Neil Levine; Shlomo Zilberstein Adaptive Peer Selection Conference Proceedings of the 2nd International Workshop on Peer-to-Peer Systems, Berkeley, California, 2003. @conference{SZ:BFLZiptps03,
title = {Adaptive Peer Selection},
author = {Daniel S Bernstein and Zhengzhu Feng and Brian Neil Levine and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/BFLZiptps03.pdf},
doi = {10.1007/978-3-540-45172-3_22},
year = {2003},
date = {2003-01-01},
booktitle = {Proceedings of the 2nd International Workshop on Peer-to-Peer Systems},
pages = {237--246},
address = {Berkeley, California},
abstract = {In a peer-to-peer file-sharing system, a client desiring a particular file must choose a source from which to download. The problem of selecting a good data source is difficult because some peers may not be encountered more than once, and many peers are on low-bandwidth connections. Despite these facts, information obtained about peers just prior to the download can help guide peer selection. A client can gain additional time savings by aborting bad download attempts until an acceptable peer is discovered. We denote as peer selection the entire process of switching among peers and finally settling on one. Our main contribution is to use the methodology of machine learning for the construction of good peer selection strategies from past experience. Decision tree learning is used for rating peers based on low-cost information, and Markov decision processes are used for deriving a policy for switching among peers. Preliminary results with the Gnutella network demonstrate the promise of this approach.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
In a peer-to-peer file-sharing system, a client desiring a particular file must choose a source from which to download. The problem of selecting a good data source is difficult because some peers may not be encountered more than once, and many peers are on low-bandwidth connections. Despite these facts, information obtained about peers just prior to the download can help guide peer selection. A client can gain additional time savings by aborting bad download attempts until an acceptable peer is discovered. We denote as peer selection the entire process of switching among peers and finally settling on one. Our main contribution is to use the methodology of machine learning for the construction of good peer selection strategies from past experience. Decision tree learning is used for rating peers based on low-cost information, and Markov decision processes are used for deriving a policy for switching among peers. Preliminary results with the Gnutella network demonstrate the promise of this approach. |
Raphen Becker; Shlomo Zilberstein; Victor Lesser; Claudia V Goldman Transition-Independent Decentralized Markov Decision Processes Conference Proceedings of the 2nd International Conference on Autonomous Agents and Multi Agent Systems (AAMAS), Melbourne, Australia, 2003, (Best Paper Award). @conference{SZ:BZLGaamas03,
title = {Transition-Independent Decentralized Markov Decision Processes},
author = {Raphen Becker and Shlomo Zilberstein and Victor Lesser and Claudia V Goldman},
url = {http://rbr.cs.umass.edu/shlomo/papers/BZLGaamas03.pdf},
doi = {10.1145/860575.860583},
year = {2003},
date = {2003-01-01},
booktitle = {Proceedings of the 2nd International Conference on Autonomous Agents and Multi Agent Systems (AAMAS)},
pages = {41--48},
address = {Melbourne, Australia},
abstract = {There has been substantial progress with formal models for sequential decision making by individual agents using the Markov decision process (MDP). However, similar treatment of multi-agent systems is lacking. A recent complexity result, showing that solving decentralized MDPs is NEXP-hard, provides a partial explanation. To overcome this complexity barrier, we identify a general class of transition-independent decentralized MDPs that is widely applicable. The class consists of independent collaborating agents that are tied together through a global reward function that depends upon both of their histories. We present a novel algorithm for solving this class of problems and examine its properties. The result is the first effective technique to solve optimally a class of decentralized MDPs. This lays the foundation for further work in this area on both exact and approximate solutions.},
note = {Best Paper Award},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
There has been substantial progress with formal models for sequential decision making by individual agents using the Markov decision process (MDP). However, similar treatment of multi-agent systems is lacking. A recent complexity result, showing that solving decentralized MDPs is NEXP-hard, provides a partial explanation. To overcome this complexity barrier, we identify a general class of transition-independent decentralized MDPs that is widely applicable. The class consists of independent collaborating agents that are tied together through a global reward function that depends upon both of their histories. We present a novel algorithm for solving this class of problems and examine its properties. The result is the first effective technique to solve optimally a class of decentralized MDPs. This lays the foundation for further work in this area on both exact and approximate solutions. |
Claudia V Goldman; Shlomo Zilberstein Optimizing Information Exchange in Cooperative Multi-agent Systems Conference Proceedings of the 2nd International Conference on Autonomous Agents and Multi Agent Systems (AAMAS), Melbourne, Australia, 2003. @conference{SZ:GZaamas03,
title = {Optimizing Information Exchange in Cooperative Multi-agent Systems},
author = {Claudia V Goldman and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/GZaamas03.pdf},
doi = {10.1145/860575.860598},
year = {2003},
date = {2003-01-01},
booktitle = {Proceedings of the 2nd International Conference on Autonomous Agents and Multi Agent Systems (AAMAS)},
pages = {137--144},
address = {Melbourne, Australia},
abstract = {Decentralized control of a cooperative multi-agent system is the problem faced by multiple decision-makers that share a common set of objectives. The decision-makers may be robots placed at separate geographical locations or computational processes distributed in an information space. It may be impossible or undesirable for these decision-makers to share all their knowledge all the time. Furthermore, exchanging information may incur a cost associated with the required bandwidth or with the risk of revealing it to competing agents. Assuming that communication may not be reliable adds another dimension of complexity to the problem. This paper develops a decision-theoretic solution to this problem, treating both standard actions and communication as explicit choices that the decision maker must consider. The goal is to derive both action policies and communication policies that together optimize a global value function. We present an analytical model to evaluate the trade-off between the cost of communication and the value of the information received. Finally, to address the complexity of this hard optimization problem, we develop a practical approximation technique based on myopic meta-level control of communication.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Decentralized control of a cooperative multi-agent system is the problem faced by multiple decision-makers that share a common set of objectives. The decision-makers may be robots placed at separate geographical locations or computational processes distributed in an information space. It may be impossible or undesirable for these decision-makers to share all their knowledge all the time. Furthermore, exchanging information may incur a cost associated with the required bandwidth or with the risk of revealing it to competing agents. Assuming that communication may not be reliable adds another dimension of complexity to the problem. This paper develops a decision-theoretic solution to this problem, treating both standard actions and communication as explicit choices that the decision maker must consider. The goal is to derive both action policies and communication policies that together optimize a global value function. We present an analytical model to evaluate the trade-off between the cost of communication and the value of the information received. Finally, to address the complexity of this hard optimization problem, we develop a practical approximation technique based on myopic meta-level control of communication. |
Claudia V Goldman; Shlomo Zilberstein Mechanism Design for Communication in Cooperative Systems Conference Proceedings of the 5th Workshop on Game Theoretic and Decision Theoretic Agents, Melbourne, Australia, 2003. @conference{SZ:GZgtdt03,
title = {Mechanism Design for Communication in Cooperative Systems},
author = {Claudia V Goldman and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/GZgtdt03.pdf},
year = {2003},
date = {2003-01-01},
booktitle = {Proceedings of the 5th Workshop on Game Theoretic and Decision Theoretic Agents},
address = {Melbourne, Australia},
abstract = {Distributed systems are characterized by having partial observability of the global state during execution. Nevertheless, when these systems comprise cooperative agents, they should attain global objectives. Planning for these decentralized systems is a very complex task. Exchange of local information through communication can alleviate this complexity by allowing the agents to be synchronized from time to time. Due to costs associated with real-world communication, agents may not be able to continuously obtain full observability of the system. We examine mechanisms that result in the decomposition of the global problem into lo- cal simpler problems that are applied each time the agents exchange information. The communication policies are computed with respect to a given mechanism and policy of action. This paper presents a framework to study these mechanisms and evaluation criteria to compare them. We also review related work on mechanism design and compare the approaches.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Distributed systems are characterized by having partial observability of the global state during execution. Nevertheless, when these systems comprise cooperative agents, they should attain global objectives. Planning for these decentralized systems is a very complex task. Exchange of local information through communication can alleviate this complexity by allowing the agents to be synchronized from time to time. Due to costs associated with real-world communication, agents may not be able to continuously obtain full observability of the system. We examine mechanisms that result in the decomposition of the global problem into lo- cal simpler problems that are applied each time the agents exchange information. The communication policies are computed with respect to a given mechanism and policy of action. This paper presents a framework to study these mechanisms and evaluation criteria to compare them. We also review related work on mechanism design and compare the approaches. |
Max Horstmann; Shlomo Zilberstein Automated Generation of Understandable Contingency Plans Conference ICAPS Workshop on Planning Under Uncertainty and Incomplete Information, Trento, Italy, 2003. @conference{SZ:HZicaps03ws,
title = {Automated Generation of Understandable Contingency Plans},
author = {Max Horstmann and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/HZicaps03ws.pdf},
year = {2003},
date = {2003-01-01},
booktitle = {ICAPS Workshop on Planning Under Uncertainty and Incomplete Information},
address = {Trento, Italy},
abstract = {Markov decision processes (MDPs) and contingency planning (CP) are two widely used approaches to planning under uncertainty. MDPs are attractive because the model is extremely general and because many algorithms exist for deriving optimal plans. In contrast, CP is normally performed using heuristic techniques that do not guarantee optimality, but the resulting plans are more compact and more understandable. The inability to present MDP policies in a clear, intuitive way has limited their applicability in some important domains. We examine the relationship between the two paradigms and present an anytime algorithm for deriving optimal contingency plans for an MDP. The resulting algorithm combines effectively the strengths of the two approaches.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Markov decision processes (MDPs) and contingency planning (CP) are two widely used approaches to planning under uncertainty. MDPs are attractive because the model is extremely general and because many algorithms exist for deriving optimal plans. In contrast, CP is normally performed using heuristic techniques that do not guarantee optimality, but the resulting plans are more compact and more understandable. The inability to present MDP policies in a clear, intuitive way has limited their applicability in some important domains. We examine the relationship between the two paradigms and present an anytime algorithm for deriving optimal contingency plans for an MDP. The resulting algorithm combines effectively the strengths of the two approaches. |
Zhengzhu Feng; Eric A Hansen; Shlomo Zilberstein Symbolic Generalization for On-line Planning Conference Proceedings of the 19th Conference on Uncertainty in Artificial Intelligence (UAI), Acapulco, Mexico, 2003. @conference{SZ:FHZuai03,
title = {Symbolic Generalization for On-line Planning},
author = {Zhengzhu Feng and Eric A Hansen and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/FHZuai03.pdf},
year = {2003},
date = {2003-01-01},
booktitle = {Proceedings of the 19th Conference on Uncertainty in Artificial Intelligence (UAI)},
pages = {209--216},
address = {Acapulco, Mexico},
abstract = {Symbolic representations have been used successfully in off-line planning algorithms for Markov decision processes. We show that they can also improve the performance of on-line planners. In addition to reducing computation time, symbolic generalization can reduce the amount of costly real-world interactions required for convergence. We introduce Symbolic Real-Time Dynamic Programming (or sRTDP), an extension of RTDP. After each step of on-line interaction with an environment, sRTDP uses symbolic model-checking techniques to generalizes its experience by updating a group of states rather than a single state. We examine two heuristic approaches to dynamic grouping of states and show that they accelerate the planning process significantly in terms of both CPU time and the number of steps of interaction with the environment.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Symbolic representations have been used successfully in off-line planning algorithms for Markov decision processes. We show that they can also improve the performance of on-line planners. In addition to reducing computation time, symbolic generalization can reduce the amount of costly real-world interactions required for convergence. We introduce Symbolic Real-Time Dynamic Programming (or sRTDP), an extension of RTDP. After each step of on-line interaction with an environment, sRTDP uses symbolic model-checking techniques to generalizes its experience by updating a group of states rather than a single state. We examine two heuristic approaches to dynamic grouping of states and show that they accelerate the planning process significantly in terms of both CPU time and the number of steps of interaction with the environment. |
Daniel S Bernstein; Lev Finkelstein; Shlomo Zilberstein Contract Algorithms and Robots on Rays: Unifying Two Scheduling Problems Conference Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI), Acapulco, Mexico, 2003. @conference{SZ:BFZijcai03,
title = {Contract Algorithms and Robots on Rays: Unifying Two Scheduling Problems},
author = {Daniel S Bernstein and Lev Finkelstein and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/BFZijcai03.pdf},
year = {2003},
date = {2003-01-01},
booktitle = {Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI)},
pages = {1211--1217},
address = {Acapulco, Mexico},
abstract = {We study two apparently different, but formally similar, scheduling problems. The first problem involves contract algorithms, which can trade off run time for solution quality, as long as the amount of available run time is known in advance. The problem is to schedule contract algorithms to run on parallel processors, under the condition that an interruption can occur at any time, and upon interruption a solution to any one of a number of problems can be requested. Schedules are compared in terms of acceleration ratio, which is a worst-case measure of efficiency. We provide a schedule and prove its optimality among a particular class of schedules. Our second problem involves multiple robots searching for a goal on one of multiple rays. Search strategies are compared in terms of time-competitive ratio, the ratio of the total search time to the time it would take for one robot to traverse directly to the goal. We demonstrate that search strategies and contract schedules are formally equivalent. In addition, for our class of schedules, we derive a formula relating the acceleration ratio of a schedule to the time-competitive ratio of the corresponding search strategy.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
We study two apparently different, but formally similar, scheduling problems. The first problem involves contract algorithms, which can trade off run time for solution quality, as long as the amount of available run time is known in advance. The problem is to schedule contract algorithms to run on parallel processors, under the condition that an interruption can occur at any time, and upon interruption a solution to any one of a number of problems can be requested. Schedules are compared in terms of acceleration ratio, which is a worst-case measure of efficiency. We provide a schedule and prove its optimality among a particular class of schedules. Our second problem involves multiple robots searching for a goal on one of multiple rays. Search strategies are compared in terms of time-competitive ratio, the ratio of the total search time to the time it would take for one robot to traverse directly to the goal. We demonstrate that search strategies and contract schedules are formally equivalent. In addition, for our class of schedules, we derive a formula relating the acceleration ratio of a schedule to the time-competitive ratio of the corresponding search strategy. |
Max Horstmann; Shlomo Zilberstein Automated Generation of Understandable Contingency Plans Conference Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI), Acapulco, Mexico, 2003. @conference{SZ:HZijcai03,
title = {Automated Generation of Understandable Contingency Plans},
author = {Max Horstmann and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/HZijcai03.pdf},
year = {2003},
date = {2003-01-01},
booktitle = {Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI)},
pages = {1518--1519},
address = {Acapulco, Mexico},
abstract = {Markov Decision Processes (MDPs) and contingency planning (CP) are two widely used approaches to planning under uncertainty. MDPs are attractive because the model is extremely general and because many algorithms exist for deriving optimal plans. In contrast, CP is normally performed using heuristic techniques that do not guarantee optimality, but the resulting plans are more compact and more understandable. The inability to present MDP policies in a clear, intuitive way has limited their applicability in some important domains. We introduce an anytime algorithm for deriving contingency plans that combines the advantages of the two approaches.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Markov Decision Processes (MDPs) and contingency planning (CP) are two widely used approaches to planning under uncertainty. MDPs are attractive because the model is extremely general and because many algorithms exist for deriving optimal plans. In contrast, CP is normally performed using heuristic techniques that do not guarantee optimality, but the resulting plans are more compact and more understandable. The inability to present MDP policies in a clear, intuitive way has limited their applicability in some important domains. We introduce an anytime algorithm for deriving contingency plans that combines the advantages of the two approaches. |
Zhengzhu Feng; Eric A Hansen; Shlomo Zilberstein Symbolic Real-Time Dynamic Programming Conference IJCAI Workshop on Model Checking and Artificial Intelligence, Acapulco, Mexico, 2003. @conference{SZ:FHZijcai03,
title = {Symbolic Real-Time Dynamic Programming},
author = {Zhengzhu Feng and Eric A Hansen and Shlomo Zilberstein},
year = {2003},
date = {2003-01-01},
booktitle = {IJCAI Workshop on Model Checking and Artificial Intelligence},
address = {Acapulco, Mexico},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
|
Andrew Arnt; Shlomo Zilberstein Learning to Perform Moderation in Online Forums Conference Proceedings of the IEEE / WIC International Conference on Web Intelligence, Acapulco, Mexico, 2003. @conference{SZ:AZwi03,
title = {Learning to Perform Moderation in Online Forums},
author = {Andrew Arnt and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/AZwi03.pdf},
doi = {10.1109/WI.2003.1241285},
year = {2003},
date = {2003-01-01},
booktitle = {Proceedings of the IEEE / WIC International Conference on Web Intelligence},
pages = {637--641},
address = {Acapulco, Mexico},
abstract = {Online discussion forums are a valuable resource for people looking to find information, discuss ideas, and get advice on the Internet. Unfortunately, many forums have too much activity and information available, resulting in information overload. Moderation systems are implemented in some forums as a way to handle this problem, but due to sparsity issues, they are often not sufficient. In this paper we describe a novel method for learning from past moderations to develop a classifier that can perform automated moderation and thus address the sparsity problem. Additionally, we discuss the possibility of training a moderating classifier on a moderated forum and then applying it to an otherwise unmoderated forum.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Online discussion forums are a valuable resource for people looking to find information, discuss ideas, and get advice on the Internet. Unfortunately, many forums have too much activity and information available, resulting in information overload. Moderation systems are implemented in some forums as a way to handle this problem, but due to sparsity issues, they are often not sufficient. In this paper we describe a novel method for learning from past moderations to develop a classifier that can perform automated moderation and thus address the sparsity problem. Additionally, we discuss the possibility of training a moderating classifier on a moderated forum and then applying it to an otherwise unmoderated forum. |
2002
|
Daniel S Bernstein; Robert Givan; Neil Immerman; Shlomo Zilberstein The Complexity of Decentralized Control of Markov Decision Processes Journal Article In: Mathematics of Operations Research (MOR), vol. 27, no. 4, pp. 819–840, 2002, (IFAAMAS Influential Paper Award). @article{SZ:BGIZmor02,
title = {The Complexity of Decentralized Control of Markov Decision Processes},
author = {Daniel S Bernstein and Robert Givan and Neil Immerman and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/BGIZmor02.pdf},
doi = {10.1287/moor.27.4.819.297},
year = {2002},
date = {2002-01-01},
journal = {Mathematics of Operations Research (MOR)},
volume = {27},
number = {4},
pages = {819--840},
abstract = {We consider decentralized control of Markov decision processes and give complexity bounds on the worst-case running time for algorithms that find optimal solutions. Generalizations of both the fully-observable case and the partially-observable case that allow for decentralized control are described. For even two agents, the finite-horizon problems corresponding to both of these models are hard for non-deterministic exponential time. These complexity results illustrate a fundamental difference between centralized and decentralized control of Markov decision processes. In contrast to the problems involving centralized control, the problems we consider provably do not admit polynomial-time algorithms. Furthermore, assuming EXP =/= NEXP, the problems require super-exponential time to solve in the worst case.},
note = {IFAAMAS Influential Paper Award},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
We consider decentralized control of Markov decision processes and give complexity bounds on the worst-case running time for algorithms that find optimal solutions. Generalizations of both the fully-observable case and the partially-observable case that allow for decentralized control are described. For even two agents, the finite-horizon problems corresponding to both of these models are hard for non-deterministic exponential time. These complexity results illustrate a fundamental difference between centralized and decentralized control of Markov decision processes. In contrast to the problems involving centralized control, the problems we consider provably do not admit polynomial-time algorithms. Furthermore, assuming EXP =/= NEXP, the problems require super-exponential time to solve in the worst case. |
Daniel S Bernstein; Theodore J Perkins; Shlomo Zilberstein; Lev Finkelstein Scheduling Contract Algorithms on Multiple Processors Conference Proceedings of the 18th National Conference on Artificial Intelligence (AAAI), Edmonton, Alberta, 2002. @conference{SZ:BPZFaaai02,
title = {Scheduling Contract Algorithms on Multiple Processors},
author = {Daniel S Bernstein and Theodore J Perkins and Shlomo Zilberstein and Lev Finkelstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/BPZFaaai02.pdf},
year = {2002},
date = {2002-01-01},
booktitle = {Proceedings of the 18th National Conference on Artificial Intelligence (AAAI)},
pages = {702--706},
address = {Edmonton, Alberta},
abstract = {Anytime algorithms offer a tradeoff between computation time and the quality of the result returned. They can be divided into two classes: contract algorithms, for which the total run time must be specified in advance, and interruptible algorithms, which can be queried at any time for a solution. An interruptible algorithm can be constructed from a contract algorithm by repeatedly activating the contract algorithm with increasing run times. The acceleration ratio of a run-time schedule is a worst-case measure of how inefficient the constructed interruptible algorithm is compared to the contract algorithm. The smallest acceleration ratio achievable on a single processor is known. Using multiple processors, smaller acceleration ratios are possible. In this paper, we provide a schedule for m processors and prove that it is optimal for all m. Our results provide general guidelines for the use of parallel processors in the design of real-time systems.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Anytime algorithms offer a tradeoff between computation time and the quality of the result returned. They can be divided into two classes: contract algorithms, for which the total run time must be specified in advance, and interruptible algorithms, which can be queried at any time for a solution. An interruptible algorithm can be constructed from a contract algorithm by repeatedly activating the contract algorithm with increasing run times. The acceleration ratio of a run-time schedule is a worst-case measure of how inefficient the constructed interruptible algorithm is compared to the contract algorithm. The smallest acceleration ratio achievable on a single processor is known. Using multiple processors, smaller acceleration ratios are possible. In this paper, we provide a schedule for m processors and prove that it is optimal for all m. Our results provide general guidelines for the use of parallel processors in the design of real-time systems. |
Shlomo Zilberstein; Richard Washington; Daniel S Bernstein; Abdel-Illah Mouaddib Decision-Theoretic Control of Planetary Rovers Conference Advances in Plan-Based Control of Robotic Agents, International Seminar, Revised Papers, Dagstuhl Castle, Germany, 2002. @conference{SZ:ZWBMlnai02,
title = {Decision-Theoretic Control of Planetary Rovers},
author = {Shlomo Zilberstein and Richard Washington and Daniel S Bernstein and Abdel-Illah Mouaddib},
url = {http://rbr.cs.umass.edu/shlomo/papers/ZWBMlnai02.pdf},
doi = {10.1007/3-540-37724-7_16},
year = {2002},
date = {2002-01-01},
booktitle = {Advances in Plan-Based Control of Robotic Agents, International Seminar, Revised Papers},
pages = {270--289},
address = {Dagstuhl Castle, Germany},
abstract = {Planetary rovers are small unmanned vehicles equipped with cameras and a variety of sensors used for scientific experiments. They must operate under tight constraints over such resources as operation time, power, storage capacity, and communication bandwidth. Moreover, the limited computational resources of the rover limit the complexity of on-line planning and scheduling. We describe two decision-theoretic approaches to maximize the productivity of planetary rovers: one based on adaptive planning and the other on hierarchical reinforcement learning. Both approaches map the problem into a Markov decision problem and attempt to solve a large part of the problem off-line, exploiting the structure of the plan and independence between plan components. We examine the advantages and limitations of these techniques and their scalability.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Planetary rovers are small unmanned vehicles equipped with cameras and a variety of sensors used for scientific experiments. They must operate under tight constraints over such resources as operation time, power, storage capacity, and communication bandwidth. Moreover, the limited computational resources of the rover limit the complexity of on-line planning and scheduling. We describe two decision-theoretic approaches to maximize the productivity of planetary rovers: one based on adaptive planning and the other on hierarchical reinforcement learning. Both approaches map the problem into a Markov decision problem and attempt to solve a large part of the problem off-line, exploiting the structure of the plan and independence between plan components. We examine the advantages and limitations of these techniques and their scalability. |
2001
|
Eric Horvitz; Shlomo Zilberstein (Ed.) Artificial Intelligence Journal Special Issue: omputational Tradeoffs under Bounded Resources Book 2001. @book{SZ:HZaij01s,
title = {Artificial Intelligence Journal Special Issue: omputational Tradeoffs under Bounded Resources},
editor = {Eric Horvitz and Shlomo Zilberstein},
url = {https://www.sciencedirect.com/journal/artificial-intelligence/vol/126/issue/1},
year = {2001},
date = {2001-01-01},
volume = {126},
number = {1-2},
keywords = {},
pubstate = {published},
tppubtype = {book}
}
|
Eric Horvitz; Shlomo Zilberstein Computational Tradeoffs under Bounded Resources Journal Article In: Artificial Intelligence (AIJ), vol. 126, no. 1-2, pp. 1–4, 2001. @article{SZ:HZaij01b,
title = {Computational Tradeoffs under Bounded Resources},
author = {Eric Horvitz and Shlomo Zilberstein},
url = {https://doi.org/10.1016/S0004-3702(01)00051-0},
doi = {10.1016/S0004-3702(01)00051-0},
year = {2001},
date = {2001-01-01},
journal = {Artificial Intelligence (AIJ)},
volume = {126},
number = {1-2},
pages = {1--4},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
|
Eric A Hansen; Shlomo Zilberstein Monitoring and Control of Anytime Algorithms: A Dynamic Programming Approach Journal Article In: Artificial Intelligence (AIJ), vol. 126, no. 1-2, pp. 139–157, 2001. @article{SZ:HZaij01a,
title = {Monitoring and Control of Anytime Algorithms: A Dynamic Programming Approach},
author = {Eric A Hansen and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/HZaij01a.pdf},
doi = {10.1016/S0004-3702(00)00068-0},
year = {2001},
date = {2001-01-01},
journal = {Artificial Intelligence (AIJ)},
volume = {126},
number = {1-2},
pages = {139--157},
abstract = {Anytime algorithms offer a tradeoff between solution quality and computation time that has proved useful in solving time-critical problems such as planning and scheduling, belief network evaluation, and information gathering. To exploit this tradeoff, a system must be able to decide when to stop deliberation and act on the currently available solution. This paper analyzes the characteristics of existing techniques for meta-level control of anytime algorithms and develops a new framework for monitoring and control. The new framework handles effectively the uncertainty associated with the algorithm's performance profile, the uncertainty associated with the domain of operation, and the cost of monitoring progress. The result is an efficient non-myopic solution to the meta-level control problem for anytime algorithms.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Anytime algorithms offer a tradeoff between solution quality and computation time that has proved useful in solving time-critical problems such as planning and scheduling, belief network evaluation, and information gathering. To exploit this tradeoff, a system must be able to decide when to stop deliberation and act on the currently available solution. This paper analyzes the characteristics of existing techniques for meta-level control of anytime algorithms and develops a new framework for monitoring and control. The new framework handles effectively the uncertainty associated with the algorithm's performance profile, the uncertainty associated with the domain of operation, and the cost of monitoring progress. The result is an efficient non-myopic solution to the meta-level control problem for anytime algorithms. |
Eric A Hansen; Shlomo Zilberstein LAO*: A Heuristic Search Algorithm that Finds Solutions with Loops Journal Article In: Artificial Intelligence (AIJ), vol. 129, no. 1-2, pp. 35–62, 2001. @article{SZ:HZaij01bb,
title = {LAO*: A Heuristic Search Algorithm that Finds Solutions with Loops},
author = {Eric A Hansen and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/HZaij01b.pdf},
doi = {10.1016/S0004-3702(01)00106-0},
year = {2001},
date = {2001-01-01},
journal = {Artificial Intelligence (AIJ)},
volume = {129},
number = {1-2},
pages = {35--62},
abstract = {Classic heuristic search algorithms can find solutions that take the form of a simple path (A*), a tree, or an acyclic graph (AO*). In this paper, we describe a novel generalization of heuristic search, called LAO*, that can find solutions with loops. We show that LAO* can be used to solve Markov decision problems and that it shares the advantage heuristic search has over dynamic programming for other classes of problems: given a start state, it can find an optimal solution without evaluating the entire state space.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Classic heuristic search algorithms can find solutions that take the form of a simple path (A*), a tree, or an acyclic graph (AO*). In this paper, we describe a novel generalization of heuristic search, called LAO*, that can find solutions with loops. We show that LAO* can be used to solve Markov decision problems and that it shares the advantage heuristic search has over dynamic programming for other classes of problems: given a start state, it can find an optimal solution without evaluating the entire state space. |
Shlomo Zilberstein Reasoning about Rational Agents: A Review Journal Article In: AI Magazine, vol. 22, no. 4, pp. 146–148, 2001. @article{SZ:Zaimag01,
title = {Reasoning about Rational Agents: A Review},
author = {Shlomo Zilberstein},
url = {http://www.aaai.org/ojs/index.php/aimagazine/article/view/1600},
year = {2001},
date = {2001-01-01},
journal = {AI Magazine},
volume = {22},
number = {4},
pages = {146--148},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
|
Stephane Cardon; Abdel-Illah Mouaddib; Shlomo Zilberstein; Richard Washington Adaptive Control of Acyclic Progressive Processing Task Structures Conference Proceedings of the 17th International Joint Conference on Artificial Intelligence (IJCAI), Seattle, Washington, 2001. @conference{SZ:CMZWijcai01,
title = {Adaptive Control of Acyclic Progressive Processing Task Structures},
author = {Stephane Cardon and Abdel-Illah Mouaddib and Shlomo Zilberstein and Richard Washington},
url = {http://rbr.cs.umass.edu/shlomo/papers/CMZWijcai01.pdf},
year = {2001},
date = {2001-01-01},
booktitle = {Proceedings of the 17th International Joint Conference on Artificial Intelligence (IJCAI)},
pages = {701--706},
address = {Seattle, Washington},
abstract = {The progressive processing model allows a system to trade off resource consumption against the quality of the outcome by mapping each activity to a graph of potential solution methods. In the past, only semi-linear graphs have been used. We examine the application of the model to control the operation of an autonomous rover which operates under tight resource constraints. The task structure is generalized to directed acyclic graphs for which the optimal schedule can be computed by solving a corresponding Markov decision problem. We evaluate the complexity of the solution analytically and experimentally and show that it provides a practical approach to building an adaptive controller for this application.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
The progressive processing model allows a system to trade off resource consumption against the quality of the outcome by mapping each activity to a graph of potential solution methods. In the past, only semi-linear graphs have been used. We examine the application of the model to control the operation of an autonomous rover which operates under tight resource constraints. The task structure is generalized to directed acyclic graphs for which the optimal schedule can be computed by solving a corresponding Markov decision problem. We evaluate the complexity of the solution analytically and experimentally and show that it provides a practical approach to building an adaptive controller for this application. |
Daniel S Bernstein; Shlomo Zilberstein Reinforcement Learning for Weakly-Coupled MDPs and an Application to Planetary Rover Control Conference Proceedings of the 6th European Conference on Planning (ECP), Toledo, Spain, 2001. @conference{SZ:BZecp01,
title = {Reinforcement Learning for Weakly-Coupled MDPs and an Application to Planetary Rover Control},
author = {Daniel S Bernstein and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/BZecp01.pdf},
year = {2001},
date = {2001-01-01},
booktitle = {Proceedings of the 6th European Conference on Planning (ECP)},
pages = {373--378},
address = {Toledo, Spain},
abstract = {The progressive processing model allows a system to trade off resource consumption against the quality of the outcome by mapping each activity to a graph of potential solution methods. In the past, only semi-linear graphs have been used. We examine the application of the model to control the operation of an autonomous rover which operates under tight resource constraints. The task structure is generalized to directed acyclic graphs for which the optimal schedule can be computed by solving a corresponding Markov decision problem. We evaluate the complexity of the solution analytically and experimentally and show that it provides a practical approach to building an adaptive controller for this application.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
The progressive processing model allows a system to trade off resource consumption against the quality of the outcome by mapping each activity to a graph of potential solution methods. In the past, only semi-linear graphs have been used. We examine the application of the model to control the operation of an autonomous rover which operates under tight resource constraints. The task structure is generalized to directed acyclic graphs for which the optimal schedule can be computed by solving a corresponding Markov decision problem. We evaluate the complexity of the solution analytically and experimentally and show that it provides a practical approach to building an adaptive controller for this application. |
Daniel S Bernstein; Shlomo Zilberstein; Richard Washington; John L Bresina. Planetary Rover Control as a Markov Decision Process Conference Proceedings of the The 6th International Symposium on Artificial Intelligence, Robotics and Automation in Space, Montreal, Canada, 2001. @conference{SZ:BZWBisairas01,
title = {Planetary Rover Control as a Markov Decision Process},
author = {Daniel S Bernstein and Shlomo Zilberstein and Richard Washington and John L Bresina.},
url = {http://rbr.cs.umass.edu/shlomo/papers/BZWBisairas01.pdf},
year = {2001},
date = {2001-01-01},
booktitle = {Proceedings of the The 6th International Symposium on Artificial Intelligence, Robotics and Automation in Space},
address = {Montreal, Canada},
abstract = {Planetary rovers must be effective in gathering scientific data despite uncertainty and limited resources. One step toward achieving this goal is to construct a high-level mathematical model of the problem faced by the rover and to use the model to develop a rover controller. We use the Markov decision process framework to develop a model of the rover control problem. We use Monte Carlo reinforcement learning techniques to obtain a policy from the model. The learned policy is compared to a class of heuristic policies and is found to perform better in simulation than any of the policies within that class. These preliminary results demonstrate the potential for using the Markov decision process framework along with reinforcement learning techniques to develop rover controllers.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Planetary rovers must be effective in gathering scientific data despite uncertainty and limited resources. One step toward achieving this goal is to construct a high-level mathematical model of the problem faced by the rover and to use the model to develop a rover controller. We use the Markov decision process framework to develop a model of the rover control problem. We use Monte Carlo reinforcement learning techniques to obtain a policy from the model. The learned policy is compared to a class of heuristic policies and is found to perform better in simulation than any of the policies within that class. These preliminary results demonstrate the potential for using the Markov decision process framework along with reinforcement learning techniques to develop rover controllers. |
Ping Xuan; Victor Lesser; Shlomo Zilberstein Communication Decisions in Multiagent Cooperation: Model and Experiments Conference Proceedings of the 5th International Conference on Autonomous Agents (AGENTS), Montreal, Canada, 2001. @conference{SZ:XLZagents01,
title = {Communication Decisions in Multiagent Cooperation: Model and Experiments},
author = {Ping Xuan and Victor Lesser and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/XLZagents01.pdf},
doi = {10.1145/375735.376469},
year = {2001},
date = {2001-01-01},
booktitle = {Proceedings of the 5th International Conference on Autonomous Agents (AGENTS)},
pages = {616--623},
address = {Montreal, Canada},
abstract = {In multi-agent cooperation, agents share a common goal, which is evaluated through a global utility function. However, an agent typically cannot observe the global state of an uncertain environment, and therefore they must communicate with each other in order to share the information needed for deciding which actions to take. We argue that, when communication incurs a cost (due to resource consumption, for example), whether to communicate or not also becomes a decision to make. Hence, communication decision becomes part of the overall agent decision problem. In order to explicitly address this problem, we present a multi-agent extension to Markov decision processes in which communication can be modeled as an explicit action that incurs a cost. This framework provides a foundation for a quantied study of agent coordination policies and provides both motivation and insight to the design of heuristic approaches. An example problem is studied under this framework. From this example we can see the impact communication policies have on the overall agent policies, and what implications we can find toward the design of agent coordination policies.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
In multi-agent cooperation, agents share a common goal, which is evaluated through a global utility function. However, an agent typically cannot observe the global state of an uncertain environment, and therefore they must communicate with each other in order to share the information needed for deciding which actions to take. We argue that, when communication incurs a cost (due to resource consumption, for example), whether to communicate or not also becomes a decision to make. Hence, communication decision becomes part of the overall agent decision problem. In order to explicitly address this problem, we present a multi-agent extension to Markov decision processes in which communication can be modeled as an explicit action that incurs a cost. This framework provides a foundation for a quantied study of agent coordination policies and provides both motivation and insight to the design of heuristic approaches. An example problem is studied under this framework. From this example we can see the impact communication policies have on the overall agent policies, and what implications we can find toward the design of agent coordination policies. |
2000
|
Shlomo Zilberstein; Abdel-Illah Mouaddib Optimal Scheduling of Progressive Processing Tasks Journal Article In: International Journal of Approximate Reasoning (IJAR), vol. 25, no. 3, pp. 169–186, 2000. @article{SZ:ZMijar00,
title = {Optimal Scheduling of Progressive Processing Tasks},
author = {Shlomo Zilberstein and Abdel-Illah Mouaddib},
url = {http://rbr.cs.umass.edu/shlomo/papers/ZMijar00.pdf},
doi = {10.1016/S0888-613X(00)00049-9},
year = {2000},
date = {2000-01-01},
journal = {International Journal of Approximate Reasoning (IJAR)},
volume = {25},
number = {3},
pages = {169--186},
abstract = {LAO* is a heuristic search algorithm for Markov decision problems that is derived from the classic heuristic search algorithm AO* (Progressive processing is an approximate reasoning model that allows a system to satisfy a set of requests under time pressure by limiting the amount of processing allocated to each task based on a predefined hierarchical task structure. It is a useful model for a variety of real-time tasks such as information retrieval, automated diagnosis, or real-time image tracking and speech recognition. In performing these tasks it is often necessary to trade-off computational resources for quality of results. This paper addresses progressive processing of information retrieval requests that are characterized by high duration uncertainty associated with each computational unit and dynamic operation allowing new requests to be added at run-time. We introduce a new approach to scheduling the processing units by constructing and solving a particular Markov decision problem. The resulting policy is an optimal schedule for the progressive processing problem. Evaluation of the technique shows that it offers a significant improvement over existing heuristic scheduling techniques. Moreover, the framework presented in this paper can be applied to real-time scheduling of a wide variety of task structures other than progressive processing.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
LAO* is a heuristic search algorithm for Markov decision problems that is derived from the classic heuristic search algorithm AO* (Progressive processing is an approximate reasoning model that allows a system to satisfy a set of requests under time pressure by limiting the amount of processing allocated to each task based on a predefined hierarchical task structure. It is a useful model for a variety of real-time tasks such as information retrieval, automated diagnosis, or real-time image tracking and speech recognition. In performing these tasks it is often necessary to trade-off computational resources for quality of results. This paper addresses progressive processing of information retrieval requests that are characterized by high duration uncertainty associated with each computational unit and dynamic operation allowing new requests to be added at run-time. We introduce a new approach to scheduling the processing units by constructing and solving a particular Markov decision problem. The resulting policy is an optimal schedule for the progressive processing problem. Evaluation of the technique shows that it offers a significant improvement over existing heuristic scheduling techniques. Moreover, the framework presented in this paper can be applied to real-time scheduling of a wide variety of task structures other than progressive processing. |
Joshua Grass; Shlomo Zilberstein A Value-Driven System for Autonomous Information Gathering Journal Article In: Journal of Intelligent Information Systems (JIIS), vol. 14, no. 1, pp. 5–27, 2000. @article{SZ:GZjiis00,
title = {A Value-Driven System for Autonomous Information Gathering},
author = {Joshua Grass and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/GZjiis00.pdf},
doi = {10.1023/A:1008718418982},
year = {2000},
date = {2000-01-01},
journal = {Journal of Intelligent Information Systems (JIIS)},
volume = {14},
number = {1},
pages = {5--27},
abstract = {This paper presents a system for autonomous information gathering in an information rich domain under time and monetary resource restrictions. The system gathers information using an explicit representation of the user's decision model and a database of information sources. Information gathering actions (queries) are scheduled myopically by selecting the query with the highest marginal value. This value is determined by the value of the information with respect to the decision being made, the responsiveness of the information source, and a given resource cost function. Finally, we compare the value-driven approach to several base-line techniques and show that the overhead of the meta-level control is made up for by the increased decision quality.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
This paper presents a system for autonomous information gathering in an information rich domain under time and monetary resource restrictions. The system gathers information using an explicit representation of the user's decision model and a database of information sources. Information gathering actions (queries) are scheduled myopically by selecting the query with the highest marginal value. This value is determined by the value of the information with respect to the decision being made, the responsiveness of the information source, and a given resource cost function. Finally, we compare the value-driven approach to several base-line techniques and show that the overhead of the meta-level control is made up for by the increased decision quality. |
Shlomo Zilberstein; Abdel-Illah Mouaddib; Andrew Arnt Dynamic Scheduling of Progressive Processing Plans Conference ECAI Workshop on New Results in Planning, Scheduling and Design, Berlin, Germany, 2000. @conference{SZ:ZMAecai00ws,
title = {Dynamic Scheduling of Progressive Processing Plans},
author = {Shlomo Zilberstein and Abdel-Illah Mouaddib and Andrew Arnt},
year = {2000},
date = {2000-01-01},
booktitle = {ECAI Workshop on New Results in Planning, Scheduling and Design},
address = {Berlin, Germany},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
|
Daniel S Bernstein; Shlomo Zilberstein; Neil Immerman The Complexity of Decentralized Control of Markov Decision Processes Conference Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence (UAI), Stanford, California, 2000, (IFAAMAS Influential Paper Award). @conference{SZ:BZIuai00,
title = {The Complexity of Decentralized Control of Markov Decision Processes},
author = {Daniel S Bernstein and Shlomo Zilberstein and Neil Immerman},
url = {http://rbr.cs.umass.edu/shlomo/papers/BZIuai00.pdf},
year = {2000},
date = {2000-01-01},
booktitle = {Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence (UAI)},
pages = {32--37},
address = {Stanford, California},
abstract = {Planning for distributed agents with partial state information is considered from a decision-theoretic perspective. We describe generalizations of both the MDP and POMDP models that allow for decentralized control. For even a small number of agents, the finite-horizon problems corresponding to both of our models are complete for nondeterministic exponential time. These complexity results illustrate a fundamental difference between centralized and decentralized control of Markov processes. In contrast to the MDP and POMDP problems, the problems we consider provably do not admit polynomial-time algorithms and most likely require doubly exponential time to solve in the worst case. We have thus provided mathematical evidence corresponding to the intuition that decentralized planning problems cannot easily be reduced to centralized problems and solved exactly using established techniques.},
note = {IFAAMAS Influential Paper Award},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Planning for distributed agents with partial state information is considered from a decision-theoretic perspective. We describe generalizations of both the MDP and POMDP models that allow for decentralized control. For even a small number of agents, the finite-horizon problems corresponding to both of our models are complete for nondeterministic exponential time. These complexity results illustrate a fundamental difference between centralized and decentralized control of Markov processes. In contrast to the MDP and POMDP problems, the problems we consider provably do not admit polynomial-time algorithms and most likely require doubly exponential time to solve in the worst case. We have thus provided mathematical evidence corresponding to the intuition that decentralized planning problems cannot easily be reduced to centralized problems and solved exactly using established techniques. |
Ping Xuan; Victor R Lesser; Shlomo Zilberstein Communication in Multi-Agent Markov Decision Processes Conference Proceedings of the 4th International Conference on Multi-Agent Systems, Boston, Massachusetts, 2000. @conference{SZ:XLZicmas00,
title = {Communication in Multi-Agent Markov Decision Processes},
author = {Ping Xuan and Victor R Lesser and Shlomo Zilberstein},
url = {https://doi.org/10.1109/ICMAS.2000.858528},
doi = {10.1109/ICMAS.2000.858528},
year = {2000},
date = {2000-01-01},
booktitle = {Proceedings of the 4th International Conference on Multi-Agent Systems},
pages = {467--468},
address = {Boston, Massachusetts},
abstract = {In this paper, we formulate agent's decision process under the framework of Markov decision processes, and in particular, the multi-agent extension to Markov decision process that includes agent communication decisions. We model communication as the way for each agent to obtain local state information in other agents, by paying a certain communication cost. Thus, agents have to decide not only which local action to perform, but also whether it is worthwhile to perform a communication action before deciding the local action. We believe that this would provide a foundation for formal study of coordination activities and may lead to some insights to the design of agent coordination policies, and heuristic approaches in particular. An example problem is studied under this framework and its implications to coordination are discussed.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
In this paper, we formulate agent's decision process under the framework of Markov decision processes, and in particular, the multi-agent extension to Markov decision process that includes agent communication decisions. We model communication as the way for each agent to obtain local state information in other agents, by paying a certain communication cost. Thus, agents have to decide not only which local action to perform, but also whether it is worthwhile to perform a communication action before deciding the local action. We believe that this would provide a foundation for formal study of coordination activities and may lead to some insights to the design of agent coordination policies, and heuristic approaches in particular. An example problem is studied under this framework and its implications to coordination are discussed. |
1999
|
Shlomo Zilberstein; Francois Charpillet; Philippe Chassaing Optimal Sequencing of Contract Algorithms Conference Proceedings of the Bar-Ilan Symposium on the Foundation of Artificial Intelligence, Ramat Gan, Israel, 1999. @conference{SZ:ZCCbisfai99,
title = {Optimal Sequencing of Contract Algorithms},
author = {Shlomo Zilberstein and Francois Charpillet and Philippe Chassaing},
year = {1999},
date = {1999-01-01},
booktitle = {Proceedings of the Bar-Ilan Symposium on the Foundation of Artificial Intelligence},
address = {Ramat Gan, Israel},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
|
Eric A Hansen; Shlomo Zilberstein A Heuristic Search Algorithm for Markov Decision Problems Conference Proceedings of the Bar-Ilan Symposium on the Foundation of Artificial Intelligence, Ramat Gan, Israel,, 1999. @conference{SZ:HZbisfai99,
title = {A Heuristic Search Algorithm for Markov Decision Problems},
author = {Eric A Hansen and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/HZbisfai99.pdf},
year = {1999},
date = {1999-01-01},
booktitle = {Proceedings of the Bar-Ilan Symposium on the Foundation of Artificial Intelligence},
address = {Ramat Gan, Israel,},
abstract = {LAO* is a heuristic search algorithm for Markov decision problems that is derived from the classic heuristic search algorithm AO* (Hansen and Zilberstein, 1998). It shares the advantage heuristic search has over dynamic programming for simpler classes of problems: it can find optimal solutions without evaluating all problem states. In this paper, we show that the derivation of LAO* from AO* makes it possible to generalize refinements of simpler heuristic search algorithms for use in solving Markov decision problems more efficiently. We also generalize some theoretical analyses of simpler search problems to Markov decision problems.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
LAO* is a heuristic search algorithm for Markov decision problems that is derived from the classic heuristic search algorithm AO* (Hansen and Zilberstein, 1998). It shares the advantage heuristic search has over dynamic programming for simpler classes of problems: it can find optimal solutions without evaluating all problem states. In this paper, we show that the derivation of LAO* from AO* makes it possible to generalize refinements of simpler heuristic search algorithms for use in solving Markov decision problems more efficiently. We also generalize some theoretical analyses of simpler search problems to Markov decision problems. |
Shlomo Zilberstein; Francois Charpillet; Philippe Chassaing Real-Time Problem-Solving with Contract Algorithms Conference Proceedings of the 16th International Joint Conference on Artificial Intelligence (IJCAI), Stockholm, Sweden, 1999. @conference{SZ:ZCCijcai99,
title = {Real-Time Problem-Solving with Contract Algorithms},
author = {Shlomo Zilberstein and Francois Charpillet and Philippe Chassaing},
url = {http://rbr.cs.umass.edu/shlomo/papers/ZCCijcai99.pdf},
year = {1999},
date = {1999-01-01},
booktitle = {Proceedings of the 16th International Joint Conference on Artificial Intelligence (IJCAI)},
pages = {1008--1015},
address = {Stockholm, Sweden},
abstract = {This paper addresses the problem of building an interruptible real-time system using contract algorithms. Contract algorithms offer a tradeoff between computation time and quality of results, but their run-time must be determined when they are activated. Many AI techniques provide useful contract algorithms that are not interruptible. We show how to optimally sequence contract algorithms to create the best interruptible system with or without stochastic information about the deadline. These results extend the foundation of real-time problem-solving and provide useful guidance for embedding contract algorithms in applications.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
This paper addresses the problem of building an interruptible real-time system using contract algorithms. Contract algorithms offer a tradeoff between computation time and quality of results, but their run-time must be determined when they are activated. Many AI techniques provide useful contract algorithms that are not interruptible. We show how to optimally sequence contract algorithms to create the best interruptible system with or without stochastic information about the deadline. These results extend the foundation of real-time problem-solving and provide useful guidance for embedding contract algorithms in applications. |
Shlomo Zilberstein; Abdel-Illah Mouaddib Reactive Control of Dynamic Progressive Processing Conference Proceedings of the 16th International Joint Conference on Artificial Intelligence (IJCAI), Stockholm, Sweden, 1999. @conference{SZ:ZMijcai99,
title = {Reactive Control of Dynamic Progressive Processing},
author = {Shlomo Zilberstein and Abdel-Illah Mouaddib},
url = {http://rbr.cs.umass.edu/shlomo/papers/ZMijcai99.pdf},
year = {1999},
date = {1999-01-01},
booktitle = {Proceedings of the 16th International Joint Conference on Artificial Intelligence (IJCAI)},
pages = {1268--1273},
address = {Stockholm, Sweden},
abstract = {Progressive processing is a model of computation that allows a system to tradeoff computational resources against the quality of results. This paper generalizes the existing model to make it suitable for dynamic composition of information retrieval techniques. The new framework addresses effectively the uncertainty associated with the duration and output quality of each component. We show how to construct an optimal meta-level controller for a single task based on solving a corresponding Markov decision problem, and how to extend the solution to the case of multiple and dynamic tasks using the notion of an opportunity cost.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Progressive processing is a model of computation that allows a system to tradeoff computational resources against the quality of results. This paper generalizes the existing model to make it suitable for dynamic composition of information retrieval techniques. The new framework addresses effectively the uncertainty associated with the duration and output quality of each component. We show how to construct an optimal meta-level controller for a single task based on solving a corresponding Markov decision problem, and how to extend the solution to the case of multiple and dynamic tasks using the notion of an opportunity cost. |
Mauricio Marengoni; Allen Hanson; Shlomo Zilberstein; Edward Riseman Control in a 3D Reconstruction System Using Selective Perception Conference Proceedings of the 7th IEEE International Conference on Computer Vision (ICCV), Kerkyra, Greece, 1999. @conference{SZ:MHZRiccv99,
title = {Control in a 3D Reconstruction System Using Selective Perception},
author = {Mauricio Marengoni and Allen Hanson and Shlomo Zilberstein and Edward Riseman},
url = {http://rbr.cs.umass.edu/shlomo/papers/MHZRiccv99.pdf},
doi = {10.1109/ICCV.1999.790421},
year = {1999},
date = {1999-01-01},
booktitle = {Proceedings of the 7th IEEE International Conference on Computer Vision (ICCV)},
pages = {1229--1236},
address = {Kerkyra, Greece},
abstract = {This paper presents a control structure for general purpose image understanding that addresses both the high level of uncertainty in local hypotheses and the computational complexity of image interpretation. The control of vision algorithms is performed by an independent subsystem that uses Bayesian networks and utility theory to compute the marginal value of information provided by alternative operators and selects the ones with the highest value. We have implemented and tested this control structure with several aerial image datasets. The results show that the knowledge base used by the system can be acquired using standard learning techniques and that the value-driven approach to the selection of vision algorithms leads to performance gains. Moreover, the modular system architecture simplifies the addition of both control knowledge and new vision algorithms.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
This paper presents a control structure for general purpose image understanding that addresses both the high level of uncertainty in local hypotheses and the computational complexity of image interpretation. The control of vision algorithms is performed by an independent subsystem that uses Bayesian networks and utility theory to compute the marginal value of information provided by alternative operators and selects the ones with the highest value. We have implemented and tested this control structure with several aerial image datasets. The results show that the knowledge base used by the system can be acquired using standard learning techniques and that the value-driven approach to the selection of vision algorithms leads to performance gains. Moreover, the modular system architecture simplifies the addition of both control knowledge and new vision algorithms. |
1998
|
Shlomo Zilberstein Satisficing and Bounded Optimality Conference AAAI Spring Symposium on Satisficing Models, Stanford, California, 1998. @conference{SZ:Zspring98,
title = {Satisficing and Bounded Optimality},
author = {Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/Zspring98.pdf},
year = {1998},
date = {1998-01-01},
booktitle = {AAAI Spring Symposium on Satisficing Models},
address = {Stanford, California},
abstract = {Since the early days of artificial intelligence there has been a constant search for useful techniques to tackle the computational complexity of decision making. By now, it is widely accepted that optimal decision making is in most cases beyond our reach. Herbert Simon's approach based on satisficing offers a more realistic alternative, but it says little on how to construct satisficing algorithms or systems. In practice, satisficing comes in many different flavors, one of which, bounded optimality, restores a weak form of optimality. This paper demonstrates this form of satisficing in the area of anytime problem-solving and argues that it is a viable approach to formalize the notion of satisficing.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Since the early days of artificial intelligence there has been a constant search for useful techniques to tackle the computational complexity of decision making. By now, it is widely accepted that optimal decision making is in most cases beyond our reach. Herbert Simon's approach based on satisficing offers a more realistic alternative, but it says little on how to construct satisficing algorithms or systems. In practice, satisficing comes in many different flavors, one of which, bounded optimality, restores a weak form of optimality. This paper demonstrates this form of satisficing in the area of anytime problem-solving and argues that it is a viable approach to formalize the notion of satisficing. |
Eric A Hansen; Shlomo Zilberstein Heuristic Search in Cyclic AND/OR Graphs Conference Proceedings of the 15th National Conference on Artificial Intelligence (AAAI), Madison, Wisconsin, 1998. @conference{SZ:HZaaai98,
title = {Heuristic Search in Cyclic AND/OR Graphs},
author = {Eric A Hansen and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/HZaaai98.pdf},
year = {1998},
date = {1998-01-01},
booktitle = {Proceedings of the 15th National Conference on Artificial Intelligence (AAAI)},
pages = {412--418},
address = {Madison, Wisconsin},
abstract = {Heuristic search algorithms can find solutions that take the form of a simple path (A*), a tree or an acyclic graph (AO*). We present a novel generalization of heuristic search (called LAO*) that can find solutions with loops, that is, solutions that take the form of a cyclic graph. We show that it can be used to solve Markov decision problems without evaluating the entire state space, giving it an advantage over dynamic-programming algorithms such as policy iteration and value iteration as an approach to stochastic planning.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Heuristic search algorithms can find solutions that take the form of a simple path (A*), a tree or an acyclic graph (AO*). We present a novel generalization of heuristic search (called LAO*) that can find solutions with loops, that is, solutions that take the form of a cyclic graph. We show that it can be used to solve Markov decision problems without evaluating the entire state space, giving it an advantage over dynamic-programming algorithms such as policy iteration and value iteration as an approach to stochastic planning. |
Joshua Grass; Shlomo Zilberstein From HTML to Usable Data: Problems in Meaning and Credibility in the WWW Conference AAAI Workshop on AI and Information Integration, Madison, Wisconsin, 1998. @conference{SZ:GZaaai98ws,
title = {From HTML to Usable Data: Problems in Meaning and Credibility in the WWW},
author = {Joshua Grass and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/GZaaai98ws.pdf},
year = {1998},
date = {1998-01-01},
booktitle = {AAAI Workshop on AI and Information Integration},
address = {Madison, Wisconsin},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
|
Abdel-Illah Mouaddib; Shlomo Zilberstein Optimal Scheduling of Dynamic Progressive Processing Conference Proceedings of the 13th European Conference on Artificial Intelligence (ECAI), Brighton, UK, 1998, (Best Paper Award). @conference{SZ:MZecai98,
title = {Optimal Scheduling of Dynamic Progressive Processing},
author = {Abdel-Illah Mouaddib and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/MZecai98.pdf},
year = {1998},
date = {1998-01-01},
booktitle = {Proceedings of the 13th European Conference on Artificial Intelligence (ECAI)},
pages = {499--503},
address = {Brighton, UK},
abstract = {Progressive processing allows a system to satisfy a set of requests under time pressure by limiting the amount of processing allocated to each task based on a predefined hierarchical task structure. It is a useful model for a variety of real-time AI tasks such as diagnosis and planning in which it is necessary to trade-off computational resources for quality of results. This paper addresses progressive processing of information retrieval requests that are characterized by high duration uncertainty associated with each computational unit and dynamic operation allowing new requests to be added at run-time. We introduce a new approach to scheduling the processing units by constructing and solving a particular Markov decision problem. The resulting policy is an optimal schedule for the progressive processing problem. Finally, we evaluate the technique and show that it offers a significant improvement over existing heuristic scheduling techniques.},
note = {Best Paper Award},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Progressive processing allows a system to satisfy a set of requests under time pressure by limiting the amount of processing allocated to each task based on a predefined hierarchical task structure. It is a useful model for a variety of real-time AI tasks such as diagnosis and planning in which it is necessary to trade-off computational resources for quality of results. This paper addresses progressive processing of information retrieval requests that are characterized by high duration uncertainty associated with each computational unit and dynamic operation allowing new requests to be added at run-time. We introduce a new approach to scheduling the processing units by constructing and solving a particular Markov decision problem. The resulting policy is an optimal schedule for the progressive processing problem. Finally, we evaluate the technique and show that it offers a significant improvement over existing heuristic scheduling techniques. |
Abdel-illah Mouaddib; Shlomo Zilberstein; Victor A Danilchenko New Directions in Modeling and Control of Progressive Processing Conference ECAI Workshop on Monitoring and Control of Real-Time Intelligent Systems, Brighton, UK, 1998. @conference{SZ:MZDecai98ws,
title = {New Directions in Modeling and Control of Progressive Processing},
author = {Abdel-illah Mouaddib and Shlomo Zilberstein and Victor A Danilchenko},
url = {http://rbr.cs.umass.edu/shlomo/papers/MZDecai98ws.pdf},
year = {1998},
date = {1998-01-01},
booktitle = {ECAI Workshop on Monitoring and Control of Real-Time Intelligent Systems},
address = {Brighton, UK},
abstract = {Progressive processing is an approach to resource-bounded execution of a set of tasks under time pressure. It allows a system to limit the computation time allocated to each task by executing a subset of its components and by producing a sub-optimal result. Progressive processing is a useful model for a variety of real-time tasks such as diagnosis, planning, and intelligent information gathering. This paper describes recent results and new directions aimed at generalizing the applicability of progressive processing by addressing the issues of high duration uncertainty and quality uncertainty associated with each computational unit. We also examine new ways to model inter-task quality dependency and a richer topology of task structures.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Progressive processing is an approach to resource-bounded execution of a set of tasks under time pressure. It allows a system to limit the computation time allocated to each task by executing a subset of its components and by producing a sub-optimal result. Progressive processing is a useful model for a variety of real-time tasks such as diagnosis, planning, and intelligent information gathering. This paper describes recent results and new directions aimed at generalizing the applicability of progressive processing by addressing the issues of high duration uncertainty and quality uncertainty associated with each computational unit. We also examine new ways to model inter-task quality dependency and a richer topology of task structures. |
1997
|
Shlomo Zilberstein Formalizing the Notion of "Satisficing" Conference AAAI Spring Symposium on Qualitative Preferences in Deliberation and Practical Reasoning, Stanford, California, 1997. @conference{SZ:Zspring97,
title = {Formalizing the Notion of "Satisficing"},
author = {Shlomo Zilberstein},
year = {1997},
date = {1997-01-01},
booktitle = {AAAI Spring Symposium on Qualitative Preferences in Deliberation and Practical Reasoning},
address = {Stanford, California},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
|
Dan Rubenstein; Leon Osterweil; Shlomo Zilberstein An Anytime Approach to Analyzing Software Systems Conference Proceedings of the 10th International FLAIRS Conference, Daytona Beach, Florida, 1997. @conference{SZ:ROZflairs97,
title = {An Anytime Approach to Analyzing Software Systems},
author = {Dan Rubenstein and Leon Osterweil and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/ROZflairs97.pdf},
year = {1997},
date = {1997-01-01},
booktitle = {Proceedings of the 10th International FLAIRS Conference},
pages = {386--391},
address = {Daytona Beach, Florida},
abstract = {Proving that a software system satisfies its requirements is a costly process. This paper discusses the benefits and challenges of structuring the analysis of software as an anytime algorithm. We demonstrate that certain incremental approaches to event sequence analysis that produce partial results are anytime algorithms, and we show how these partial results can be used to optimize the time to complete the full analysis.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Proving that a software system satisfies its requirements is a costly process. This paper discusses the benefits and challenges of structuring the analysis of software as an anytime algorithm. We demonstrate that certain incremental approaches to event sequence analysis that produce partial results are anytime algorithms, and we show how these partial results can be used to optimize the time to complete the full analysis. |
Joshua Grass; Shlomo Zilberstein Value-Driven Information Gathering Conference AAAI Workshop on Building Resource-Bounded Reasoning Systems, Providence, Rhode Island, 1997. @conference{SZ:GZaaai97ws,
title = {Value-Driven Information Gathering},
author = {Joshua Grass and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/GZaaai97ws.pdf},
year = {1997},
date = {1997-01-01},
booktitle = {AAAI Workshop on Building Resource-Bounded Reasoning Systems},
address = {Providence, Rhode Island},
abstract = {We describe a decision-theoretic approach to information gathering from a distributed network of information sources. Our approach uses an explicit representation of the user's decision model in order to plan and executein formation gathering actions. The information gathering planner issues requests based on the value of information taking in to account the computational resources and monetary costs of information gathering. At any given time, the system assesses the marginal value of dispatching new queries and selects the one with maximal value. When no further improvement of the comprehensive utility function is possible, the system stops gathering information and reports the results. We show that this approach has significant advantages including high performance, interruptibility, and adaptability to dynamic changes in the environment.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
We describe a decision-theoretic approach to information gathering from a distributed network of information sources. Our approach uses an explicit representation of the user's decision model in order to plan and executein formation gathering actions. The information gathering planner issues requests based on the value of information taking in to account the computational resources and monetary costs of information gathering. At any given time, the system assesses the marginal value of dispatching new queries and selects the one with maximal value. When no further improvement of the comprehensive utility function is possible, the system stops gathering information and reports the results. We show that this approach has significant advantages including high performance, interruptibility, and adaptability to dynamic changes in the environment. |
Abdel-Illah Mouaddib; Shlomo Zilberstein Handling Duration Uncertainty in Meta-Level Control of Progressive Processing Conference Proceedings of the 15th International Joint Conference on Artificial Intelligence (IJCAI), Nagoya, Japan, 1997. @conference{SZ:MZijcai97,
title = {Handling Duration Uncertainty in Meta-Level Control of Progressive Processing},
author = {Abdel-Illah Mouaddib and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/MZijcai97.pdf},
year = {1997},
date = {1997-01-01},
booktitle = {Proceedings of the 15th International Joint Conference on Artificial Intelligence (IJCAI)},
pages = {1201--1207},
address = {Nagoya, Japan},
abstract = {Progressive processing is a resource-bounded reasoning technique that allows a system to incrementally construct a solution to a problem using a hierarchy of processing levels. This paper focuses on the problem of meta-level control of progressive processing in domains characterized by rapid change and high level of duration uncertainty. We show that progressive processing facilitates efficient run-time monitoring and meta-level control. Our solution is based on an incremental scheduler that can handle duration uncertainty by dynamically revising the schedule during execution time based on run-time information. We also show that a probabilistic representation of duration uncertainty reduces the frequency of schedule revisions and thus improves the performance of the system. Finally, an experimental evaluation shows the contributions of this approach and its suitability for a data transmission application.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Progressive processing is a resource-bounded reasoning technique that allows a system to incrementally construct a solution to a problem using a hierarchy of processing levels. This paper focuses on the problem of meta-level control of progressive processing in domains characterized by rapid change and high level of duration uncertainty. We show that progressive processing facilitates efficient run-time monitoring and meta-level control. Our solution is based on an incremental scheduler that can handle duration uncertainty by dynamically revising the schedule during execution time based on run-time information. We also show that a probabilistic representation of duration uncertainty reduces the frequency of schedule revisions and thus improves the performance of the system. Finally, an experimental evaluation shows the contributions of this approach and its suitability for a data transmission application. |
Eric A Hansen; Shlomo Zilberstein; Victor A Danilchenko Anytime Heuristic Search: First Results Technical Report Computer Science Department, University of Massachussetts Amherst no. 97-50, 1997. @techreport{SZ:HZDtr9750,
title = {Anytime Heuristic Search: First Results},
author = {Eric A Hansen and Shlomo Zilberstein and Victor A Danilchenko},
url = {http://rbr.cs.umass.edu/shlomo/papers/HZDtr9750.pdf},
year = {1997},
date = {1997-01-01},
number = {97-50},
institution = {Computer Science Department, University of Massachussetts Amherst},
abstract = {We describe a simple technique for converting heuristic search algorithms into anytime algorithms that offer a tradeoff between search time and solution quality. The technique is related to work on use of non-admissible evaluation functions that make it possible to find good, but possibly sub-optimal, solutions more quickly than it takes to find an optimal solution. Instead of stopping the search after the first solution is found, however, we continue the search in order to find a sequence of improved solutions that eventually converges to an optimal solution. The performance of anytime heuristic search depends on the non-admissible evaluation function that guides the search. We discuss how to design a search heuristic that "optimizes" the rate at which the currently available solution improves.},
keywords = {},
pubstate = {published},
tppubtype = {techreport}
}
We describe a simple technique for converting heuristic search algorithms into anytime algorithms that offer a tradeoff between search time and solution quality. The technique is related to work on use of non-admissible evaluation functions that make it possible to find good, but possibly sub-optimal, solutions more quickly than it takes to find an optimal solution. Instead of stopping the search after the first solution is found, however, we continue the search in order to find a sequence of improved solutions that eventually converges to an optimal solution. The performance of anytime heuristic search depends on the non-admissible evaluation function that guides the search. We discuss how to design a search heuristic that "optimizes" the rate at which the currently available solution improves. |
1996
|
Shlomo Zilberstein; Stuart J Russell Optimal Composition of Real-Time Systems Journal Article In: Artificial Intelligence (AIJ), vol. 82, no. 1-2, pp. 181–213, 1996. @article{SZ:ZRaij96,
title = {Optimal Composition of Real-Time Systems},
author = {Shlomo Zilberstein and Stuart J Russell},
url = {http://rbr.cs.umass.edu/shlomo/papers/ZRaij96.pdf},
year = {1996},
date = {1996-01-01},
journal = {Artificial Intelligence (AIJ)},
volume = {82},
number = {1-2},
pages = {181--213},
abstract = {Real-time systems are designed for environments in which the utility of actions is strongly time-dependent. Recent work by Dean, Horvitz and others has shown that anytime algorithms are a useful tool for real-time system design, since they allow computation time to be traded for decision quality. In order to construct complex systems, however, we need to be able to compose larger systems from smaller, reusable anytime modules. This paper addresses two basic problems associated with composition: how to ensure the interruptibility of the composed system; and how to allocate computation time optimally among the components. The first problem is solved by a simple and general construction that incurs only a small, constant penalty. The second is solved by an off-line compilation process. We show that the general compilation problem is NP-complete. However, efficient local compilation techniques, working on a single program structure at a time, yield globally optimal allocations for a large class of programs. We illustrate these results with two simple applications.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Real-time systems are designed for environments in which the utility of actions is strongly time-dependent. Recent work by Dean, Horvitz and others has shown that anytime algorithms are a useful tool for real-time system design, since they allow computation time to be traded for decision quality. In order to construct complex systems, however, we need to be able to compose larger systems from smaller, reusable anytime modules. This paper addresses two basic problems associated with composition: how to ensure the interruptibility of the composed system; and how to allocate computation time optimally among the components. The first problem is solved by a simple and general construction that incurs only a small, constant penalty. The second is solved by an off-line compilation process. We show that the general compilation problem is NP-complete. However, efficient local compilation techniques, working on a single program structure at a time, yield globally optimal allocations for a large class of programs. We illustrate these results with two simple applications. |
Shlomo Zilberstein Resource-Bounded Sensing and Planning in Autonomous Systems Journal Article In: Autonomous Robots, vol. 3, pp. 31–48, 1996. @article{SZ:Zar96,
title = {Resource-Bounded Sensing and Planning in Autonomous Systems},
author = {Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/Zar96.pdf},
year = {1996},
date = {1996-01-01},
journal = {Autonomous Robots},
volume = {3},
pages = {31--48},
abstract = {This paper is concerned with the implications of limited computational resources and uncertainty on the design of autonomous systems. To address this problem, we redefine the principal role of sensor interpretation and planning processes. Following Agre and Chapman's plan-as-communication approach, sensing and planning are treated as computational processes that provide information to an execution architecture and thus improve the overall performance of the system. We argue that autonomous systems must be able to trade off the quality of this information with the computational resources required to produce it. Anytime algorithms, whose quality of results improves gradually as computation time increases, provide useful performance components for time-critical sensing and planning in robotic systems. In our earlier work, we introduced a compilation scheme for optimal composition of anytime algorithms. This paper demonstrates the applicability of the compilation technique to the construction of autonomous systems. The result is a flexible approach to construct systems that can operate robustly in real-time by exploiting the tradeoff between time and quality in planning, sensing and plan execution.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
This paper is concerned with the implications of limited computational resources and uncertainty on the design of autonomous systems. To address this problem, we redefine the principal role of sensor interpretation and planning processes. Following Agre and Chapman's plan-as-communication approach, sensing and planning are treated as computational processes that provide information to an execution architecture and thus improve the overall performance of the system. We argue that autonomous systems must be able to trade off the quality of this information with the computational resources required to produce it. Anytime algorithms, whose quality of results improves gradually as computation time increases, provide useful performance components for time-critical sensing and planning in robotic systems. In our earlier work, we introduced a compilation scheme for optimal composition of anytime algorithms. This paper demonstrates the applicability of the compilation technique to the construction of autonomous systems. The result is a flexible approach to construct systems that can operate robustly in real-time by exploiting the tradeoff between time and quality in planning, sensing and plan execution. |
Shlomo Zilberstein Using Anytime Algorithms in Intelligent Systems Journal Article In: AI Magazine, vol. 17, no. 3, pp. 73–83, 1996. @article{SZ:Zaimag96,
title = {Using Anytime Algorithms in Intelligent Systems},
author = {Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/Zaimag96.pdf},
year = {1996},
date = {1996-01-01},
journal = {AI Magazine},
volume = {17},
number = {3},
pages = {73--83},
abstract = {Anytime algorithms give intelligent systems the capability to trade off deliberation time for quality of results. This capability is essential for successful operation in domains such as signal interpretation, real-time diagnosis and repair, and mobile robot control. What characterizes these domains is that it is not feasible (computationally) or desirable (economically) to compute the optimal answer. This paper surveys the main control problems that arise when a system is composed of several anytime algorithms. These problems relate to optimal management of uncertainty and precision. After a brief introduction to anytime computation, the paper outlines a wide range of existing solutions to the meta-level control problem and describes current work that is aimed at increasing the applicability of anytime computation.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Anytime algorithms give intelligent systems the capability to trade off deliberation time for quality of results. This capability is essential for successful operation in domains such as signal interpretation, real-time diagnosis and repair, and mobile robot control. What characterizes these domains is that it is not feasible (computationally) or desirable (economically) to compute the optimal answer. This paper surveys the main control problems that arise when a system is composed of several anytime algorithms. These problems relate to optimal management of uncertainty and precision. After a brief introduction to anytime computation, the paper outlines a wide range of existing solutions to the meta-level control problem and describes current work that is aimed at increasing the applicability of anytime computation. |
Joshua Grass; Shlomo Zilberstein Anytime Algorithm Development Tools Journal Article In: SIGART Bulletin Special Issue on Anytime Algorithms and Deliberation Scheduling, vol. 7, no. 2, pp. 20–27, 1996. @article{SZ:GZsigart96,
title = {Anytime Algorithm Development Tools},
author = {Joshua Grass and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/GZsigart96.pdf},
year = {1996},
date = {1996-01-01},
journal = {SIGART Bulletin Special Issue on Anytime Algorithms and Deliberation Scheduling},
volume = {7},
number = {2},
pages = {20--27},
abstract = {Anytime algorithms are playing an increasingly important role in the construction of effective reasoning and planning systems. Early work on anytime algorithms concentrated on the construction of applications in such areas as medical diagnosis and mobile robot navigation. In this paper we describe a programming environment to support the development of such applications as well as larger applications in which several anytime algorithms are used. The widespread use of anytime algorithms depends largely on the availability of such programming tools for algorithm construction, performance measurement, composition of anytime algorithms, and monitoring of their execution. We present a prototype system that meets these needs. Created in lisp, this library of functions, graphical tools and monitoring modules will accelerate and simplify the process of programming with anytime algorithms.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Anytime algorithms are playing an increasingly important role in the construction of effective reasoning and planning systems. Early work on anytime algorithms concentrated on the construction of applications in such areas as medical diagnosis and mobile robot navigation. In this paper we describe a programming environment to support the development of such applications as well as larger applications in which several anytime algorithms are used. The widespread use of anytime algorithms depends largely on the availability of such programming tools for algorithm construction, performance measurement, composition of anytime algorithms, and monitoring of their execution. We present a prototype system that meets these needs. Created in lisp, this library of functions, graphical tools and monitoring modules will accelerate and simplify the process of programming with anytime algorithms. |
Eric A Hansen; Shlomo Zilberstein Monitoring Anytime Algorithms Journal Article In: SIGART Bulletin Special Issue on Anytime Algorithms and Deliberation Scheduling, vol. 7, no. 2, pp. 28–33, 1996. @article{SZ:HZsigart96,
title = {Monitoring Anytime Algorithms},
author = {Eric A Hansen and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/HZsigart96.pdf},
year = {1996},
date = {1996-01-01},
journal = {SIGART Bulletin Special Issue on Anytime Algorithms and Deliberation Scheduling},
volume = {7},
number = {2},
pages = {28--33},
abstract = {Anytime algorithms offer a tradeoff between solution quality and computation time that has proved useful in applying artificial intelligence techniques to time-critical problems. To exploit this tradeoff, a system must be able to determine the best time to stop deliberation and act on the currently available solution. If there is uncertainty about how much solution quality will improve with computation time, or about how the problem state may change after the start of the algorithm, monitoring the algorithm's progress and/or the problem state can make possible a better stopping decision and so improve the utility of the system. This paper analyzes the issues involved in run-time monitoring of anytime algorithms. It reviews previous work and casts the problem in a new framework from which some improved monitoring strategies emerge.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Anytime algorithms offer a tradeoff between solution quality and computation time that has proved useful in applying artificial intelligence techniques to time-critical problems. To exploit this tradeoff, a system must be able to determine the best time to stop deliberation and act on the currently available solution. If there is uncertainty about how much solution quality will improve with computation time, or about how the problem state may change after the start of the algorithm, monitoring the algorithm's progress and/or the problem state can make possible a better stopping decision and so improve the utility of the system. This paper analyzes the issues involved in run-time monitoring of anytime algorithms. It reviews previous work and casts the problem in a new framework from which some improved monitoring strategies emerge. |
Shlomo Zilberstein Resource-Bounded Reasoning in Intelligent Systems Journal Article In: ACM Computing Surveys, vol. 28, no. 15, 1996. @article{SZ:Zacmcs96,
title = {Resource-Bounded Reasoning in Intelligent Systems},
author = {Shlomo Zilberstein},
url = {https://doi.org/10.1145/242224.242243},
doi = {10.1145/242224.242243},
year = {1996},
date = {1996-01-01},
journal = {ACM Computing Surveys},
volume = {28},
number = {15},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
|
Eric A Hansen; Shlomo Zilberstein Monitoring the Progress of Anytime Problem-Solving Conference Proceedings of the 13th National Conference on Artificial Intelligence (AAAI), Portland, Oregon, 1996. @conference{SZ:HZaaai96,
title = {Monitoring the Progress of Anytime Problem-Solving},
author = {Eric A Hansen and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/HZaaai96.pdf},
year = {1996},
date = {1996-01-01},
booktitle = {Proceedings of the 13th National Conference on Artificial Intelligence (AAAI)},
pages = {1229--1234},
address = {Portland, Oregon},
abstract = {Anytime algorithms offer a tradeoff between solution quality and computation time that has proved useful in applying artificial intelligence techniques to time-critical problems. To exploit this tradeoff, a system must be able to determine the best time to stop deliberation and act on the currently available solution. When the rate of improvement of solution quality is uncertain, monitoring the progress of the algorithm can improve the utility of the system. This paper introduces a technique for run-time monitoring of anytime algorithms that is sensitive to the variance of the algorithm's performance, the time-dependent utility of a solution, the ability of the run-time monitor to estimate the quality of the currently available solution, and the cost of monitoring. The paper examines the conditions under which the technique is optimal and demonstrates its applicability.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Anytime algorithms offer a tradeoff between solution quality and computation time that has proved useful in applying artificial intelligence techniques to time-critical problems. To exploit this tradeoff, a system must be able to determine the best time to stop deliberation and act on the currently available solution. When the rate of improvement of solution quality is uncertain, monitoring the progress of the algorithm can improve the utility of the system. This paper introduces a technique for run-time monitoring of anytime algorithms that is sensitive to the variance of the algorithm's performance, the time-dependent utility of a solution, the ability of the run-time monitor to estimate the quality of the currently available solution, and the cost of monitoring. The paper examines the conditions under which the technique is optimal and demonstrates its applicability. |
Eric A Hansen; Andrew G Barto; Shlomo Zilberstein Reinforcement Learning for Mixed Open-loop and Closed-loop Control Conference Proceedings of the 9th Neural Information Processing Systems Conference (NIPS), Denver, Colorado, 1996. @conference{SZ:HBZnips96,
title = {Reinforcement Learning for Mixed Open-loop and Closed-loop Control},
author = {Eric A Hansen and Andrew G Barto and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/HBZnips96.pdf},
year = {1996},
date = {1996-01-01},
booktitle = {Proceedings of the 9th Neural Information Processing Systems Conference (NIPS)},
pages = {1026--1032},
address = {Denver, Colorado},
abstract = {Closed-loop control relies on sensory feedback that is usually assumed to be free. But if sensing incurs a cost, it may be cost effective to take sequences of actions in open-loop mode. We describe a reinforcement learning algorithm that learns to combine open-loop and closed-loop control when sensing incurs a cost. Although we assume reliable sensors, use of open-loop control means that actions must sometimes be taken when the current state of the controlled system is uncertain. This is a special case of the hidden-state problem in reinforcement learning, and to cope, our algorithm relies on short-term memory. The main result of the paper is a rule that significantly limits exploration of possible memory states by pruning memory states for which the estimated value of information is greater than its cost. We prove that this rule allows convergence to an optimal policy.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Closed-loop control relies on sensory feedback that is usually assumed to be free. But if sensing incurs a cost, it may be cost effective to take sequences of actions in open-loop mode. We describe a reinforcement learning algorithm that learns to combine open-loop and closed-loop control when sensing incurs a cost. Although we assume reliable sensors, use of open-loop control means that actions must sometimes be taken when the current state of the controlled system is uncertain. This is a special case of the hidden-state problem in reinforcement learning, and to cope, our algorithm relies on short-term memory. The main result of the paper is a rule that significantly limits exploration of possible memory states by pruning memory states for which the estimated value of information is greater than its cost. We prove that this rule allows convergence to an optimal policy. |
Shlomo Zilberstein; Victor Lesser Intelligent Information Gathering Using Decision Models Technical Report Computer Science Department, University of Massachussetts Amherst no. 96-35, 1996. @techreport{SZ:ZLtr9635,
title = {Intelligent Information Gathering Using Decision Models},
author = {Shlomo Zilberstein and Victor Lesser},
url = {http://rbr.cs.umass.edu/shlomo/papers/ZLtr9635.pdf},
year = {1996},
date = {1996-01-01},
number = {96-35},
institution = {Computer Science Department, University of Massachussetts Amherst},
abstract = {This paper describes an architecture for the next generation of information gathering systems. The paper is based on a research proposal whose goal is to exploit the vast amount of information sources available today on the NII including a growing number of digital libraries, independent news agencies, government agencies, as well as human experts providing a variety of services. The large number of information sources and their different levels of accessibility, reliability and associated costs present a complex information gathering coordination problem. We outline the structure and components of an information gathering system that uses an explicit representation of the user's decision model in order to organize its activity. Within this framework, information gathering planning is performed based on its marginal contribution to the user's decision quality.},
keywords = {},
pubstate = {published},
tppubtype = {techreport}
}
This paper describes an architecture for the next generation of information gathering systems. The paper is based on a research proposal whose goal is to exploit the vast amount of information sources available today on the NII including a growing number of digital libraries, independent news agencies, government agencies, as well as human experts providing a variety of services. The large number of information sources and their different levels of accessibility, reliability and associated costs present a complex information gathering coordination problem. We outline the structure and components of an information gathering system that uses an explicit representation of the user's decision model in order to organize its activity. Within this framework, information gathering planning is performed based on its marginal contribution to the user's decision quality. |
1995
|
Shlomo Zilberstein On the Utility of Planning Journal Article In: SIGART Bulletin Special Issue on Evaluating Plans, Planners, and Planning Systems, vol. 6, no. 1, pp. 42–47, 1995. @article{SZ:Zsigart95,
title = {On the Utility of Planning},
author = {Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/Zsigart95.pdf},
year = {1995},
date = {1995-01-01},
journal = {SIGART Bulletin Special Issue on Evaluating Plans, Planners, and Planning Systems},
volume = {6},
number = {1},
pages = {42--47},
abstract = {Evaluation and comparison of existing planning systems is hard because they disagree on the fundamental role of planning, on evaluation metrics, and on the notion of success and failure. This paper suggests a decision-theoretic approach to evaluate planning systems that generalizes the role of planning in intelligent systems. The planner is viewed as a source of information that is used by an execution architecture in order to select actions. A planner is only as good as the effect it has on the performance of an operational system. Our approach calls for a clear separation between the planning component and the execution architecture and for evaluation of planning systems within the context of a well-defined command, planning and execution environment. The evaluation is based on the expected utility of the domain histories that are induced by the planning component.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Evaluation and comparison of existing planning systems is hard because they disagree on the fundamental role of planning, on evaluation metrics, and on the notion of success and failure. This paper suggests a decision-theoretic approach to evaluate planning systems that generalizes the role of planning in intelligent systems. The planner is viewed as a source of information that is used by an execution architecture in order to select actions. A planner is only as good as the effect it has on the performance of an operational system. Our approach calls for a clear separation between the planning component and the execution architecture and for evaluation of planning systems within the context of a well-defined command, planning and execution environment. The evaluation is based on the expected utility of the domain histories that are induced by the planning component. |
Shlomo Zilberstein Operational Rationality through Compilation of Anytime Algorithms Journal Article In: AI Magazine, vol. 16, no. 2, pp. 79–80, 1995. @article{SZ:aimag95,
title = {Operational Rationality through Compilation of Anytime Algorithms},
author = {Shlomo Zilberstein},
url = {http://www.aaai.org/ojs/index.php/aimagazine/article/view/1136},
year = {1995},
date = {1995-01-01},
journal = {AI Magazine},
volume = {16},
number = {2},
pages = {79--80},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
|
Shlomo Zilberstein Models of Bounded Rationality Conference AAAI Fall Symposium on Rational Agency, Cambridge, Massachusetts, 1995. @conference{SZ:Zfall95,
title = {Models of Bounded Rationality},
author = {Shlomo Zilberstein},
year = {1995},
date = {1995-01-01},
booktitle = {AAAI Fall Symposium on Rational Agency},
address = {Cambridge, Massachusetts},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
|
Shlomo Zilberstein Optimizing Decision Quality with Contract Algorithms Conference Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI), Montreal, Canada, 1995. @conference{SZ:Zijcai95,
title = {Optimizing Decision Quality with Contract Algorithms},
author = {Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/Zijcai95.pdf},
year = {1995},
date = {1995-01-01},
booktitle = {Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI)},
pages = {1576--1582},
address = {Montreal, Canada},
abstract = {Contract algorithms offer a tradeoff between output quality and computation time, provided that the amount of computation time is determined prior to their activation. Originally, they were introduced as an intermediate step in the composition of interruptible anytime algorithms. However, for many real-time tasks such as information gathering, game playing, and a large class of planning problems, contract algorithms offer an ideal mechanism to optimize decision quality. This paper extends previous results regarding the meta-level control of contract algorithms by handling a more general type of performance description. The output quality of each contract algorithm is described by a probabilistic (rather than deterministic) conditional performance profile. Such profiles map input quality and computation time to a probability distribution of output quality. The composition problem is solved by an efficient off-line compilation technique that simplifies the run-time monitoring task.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Contract algorithms offer a tradeoff between output quality and computation time, provided that the amount of computation time is determined prior to their activation. Originally, they were introduced as an intermediate step in the composition of interruptible anytime algorithms. However, for many real-time tasks such as information gathering, game playing, and a large class of planning problems, contract algorithms offer an ideal mechanism to optimize decision quality. This paper extends previous results regarding the meta-level control of contract algorithms by handling a more general type of performance description. The output quality of each contract algorithm is described by a probabilistic (rather than deterministic) conditional performance profile. Such profiles map input quality and computation time to a probability distribution of output quality. The composition problem is solved by an efficient off-line compilation technique that simplifies the run-time monitoring task. |
Abdel-illah Mouaddib; Shlomo Zilberstein Knowledge-Based Anytime Computation Conference Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI), Montreal, Canada, 1995. @conference{SZ:MZijcai95,
title = {Knowledge-Based Anytime Computation},
author = {Abdel-illah Mouaddib and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/MZijcai95.pdf},
year = {1995},
date = {1995-01-01},
booktitle = {Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI)},
pages = {775--781},
address = {Montreal, Canada},
abstract = {This paper describes a real-time decision-making model that combines the expressiveness and flexibility of knowledge-based systems with the real-time advantages of anytime algorithms. Anytime algorithms offer a simple means by which an intelligent system can trade off computation time for quality of results. Previous attempts to develop knowledge-based anytime algorithms failed to produce consistent, predictable improvement of quality over time. Without performance profiles, that describe the output quality as a function of time, it is hard to exploit the flexibility of anytime algorithms. The model of progressive reasoning that is presented here is based on a hierarchy of reasoning units that allow for gradual improvement of decision quality in a predictable manner. The result is an important step towards the application of knowledge-based systems in time-critical domains.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
This paper describes a real-time decision-making model that combines the expressiveness and flexibility of knowledge-based systems with the real-time advantages of anytime algorithms. Anytime algorithms offer a simple means by which an intelligent system can trade off computation time for quality of results. Previous attempts to develop knowledge-based anytime algorithms failed to produce consistent, predictable improvement of quality over time. Without performance profiles, that describe the output quality as a function of time, it is hard to exploit the flexibility of anytime algorithms. The model of progressive reasoning that is presented here is based on a hierarchy of reasoning units that allow for gradual improvement of decision quality in a predictable manner. The result is an important step towards the application of knowledge-based systems in time-critical domains. |
Shlomo Zilberstein; Stuart J Russell Approximate Reasoning Using Anytime Algorithms Book Section In: Natarajan, S (Ed.): Imprecise and Approximate Computation, pp. 43–62, Kluwer Academic Publishers, 1995. @incollection{SZ:ZRchapter95,
title = {Approximate Reasoning Using Anytime Algorithms},
author = {Shlomo Zilberstein and Stuart J Russell},
editor = {S Natarajan},
url = {http://rbr.cs.umass.edu/shlomo/papers/ZRchapter95.pdf},
year = {1995},
date = {1995-01-01},
booktitle = {Imprecise and Approximate Computation},
pages = {43--62},
publisher = {Kluwer Academic Publishers},
abstract = {The complexity of reasoning in intelligent systems makes it undesirable, and sometimes infeasible, to find the optimal action in every situation since the deliberation process itself degrades the performance of the system. The problem is then to construct intelligent systems that react to a situation after performing the "right" amount of thinking. It is by now widely accepted that a successful system must trade off decision quality against the computational requirements of decision-making. Anytime algorithms, introduced by Dean, Horvitz and others in the late 1980's, were designed to offer such a trade-off. We have extended their work to the construction of complex systems that are composed of anytime algorithms. This paper describes the compilation and monitoring mechanisms that are required to build intelligent systems that can efficiently control their deliberation time. We present theoretical results showing that the compilation and monitoring problems are tractable in a wide range of cases, and provide two applications to illustrate the ideas.},
keywords = {},
pubstate = {published},
tppubtype = {incollection}
}
The complexity of reasoning in intelligent systems makes it undesirable, and sometimes infeasible, to find the optimal action in every situation since the deliberation process itself degrades the performance of the system. The problem is then to construct intelligent systems that react to a situation after performing the "right" amount of thinking. It is by now widely accepted that a successful system must trade off decision quality against the computational requirements of decision-making. Anytime algorithms, introduced by Dean, Horvitz and others in the late 1980's, were designed to offer such a trade-off. We have extended their work to the construction of complex systems that are composed of anytime algorithms. This paper describes the compilation and monitoring mechanisms that are required to build intelligent systems that can efficiently control their deliberation time. We present theoretical results showing that the compilation and monitoring problems are tractable in a wide range of cases, and provide two applications to illustrate the ideas. |
1994
|
Shlomo Zilberstein Teaching Graduate-level Artificial Intelligence Conference AAAI Fall Symposium on Instruction of Introductory AI, New Orleans, Louisiana, 1994. @conference{SZ:Zfall94,
title = {Teaching Graduate-level Artificial Intelligence},
author = {Shlomo Zilberstein},
year = {1994},
date = {1994-01-01},
booktitle = {AAAI Fall Symposium on Instruction of Introductory AI},
address = {New Orleans, Louisiana},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
|
Shlomo Zilberstein Meta-Level Control of Approximate Reasoning: A Decision Theoretic Approach Conference Proceedings of the 8th International Symposium on Methodologies for Intelligent Systems (ISMIS), Charlotte, North Carolina, 1994. @conference{SZ:Zismis94,
title = {Meta-Level Control of Approximate Reasoning: A Decision Theoretic Approach},
author = {Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/Zismis94.pdf},
doi = {10.1007/3-540-58495-1},
year = {1994},
date = {1994-01-01},
booktitle = {Proceedings of the 8th International Symposium on Methodologies for Intelligent Systems (ISMIS)},
pages = {114--123},
address = {Charlotte, North Carolina},
abstract = {This paper describes a novel methodology for meta-level control of approximate reasoning. We show that approximate reasoning supported by anytime algorithms offers a simple means by which an intelligent system can trade-off decision quality for deliberation cost. Such a tradeoff is an essential capability of almost every intelligent system. The model exploits probabilistic knowledge about the environment and about the performance of each component in order to optimally manage computational resources. An off-line knowledge compilation technique and a run-time monitoring process guarantee that the system is performing the "right" amount of thinking in a well-defined sense. The paper concludes with a brief description of two successful applications.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
This paper describes a novel methodology for meta-level control of approximate reasoning. We show that approximate reasoning supported by anytime algorithms offers a simple means by which an intelligent system can trade-off decision quality for deliberation cost. Such a tradeoff is an essential capability of almost every intelligent system. The model exploits probabilistic knowledge about the environment and about the performance of each component in order to optimally manage computational resources. An off-line knowledge compilation technique and a run-time monitoring process guarantee that the system is performing the "right" amount of thinking in a well-defined sense. The paper concludes with a brief description of two successful applications. |
1993
|
Shlomo Zilberstein Operational Rationality through Compilation of Anytime Algorithms PhD Thesis Computer Science Division, University of California Berkeley, 1993. @phdthesis{SZ:Zshort93,
title = {Operational Rationality through Compilation of Anytime Algorithms},
author = {Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/Zshort93.pdf},
year = {1993},
date = {1993-01-01},
school = {Computer Science Division, University of California Berkeley},
abstract = {An important and largely ignored aspect of real-time decision making is the capability of agents to factor the cost of deliberation into the decision making process. I have developed an efficient model that creates this capability. The model uses as basic components anytime algorithms whose quality of results improves gradually as computation time increases. The main contribution of this work is a compilation process that extends the property of gradual improvement from the level of single algorithms to the level of complex systems.
In standard algorithms, the fixed quality of the output allows for composition to be implemented by a simple call-return mechanism. However, when algorithms have resource allocation as a degree of freedom, there arises the question of how to construct, for example, the optimal composition of two anytime algorithms, one of which feeds its output to the other. This scheduling problem is solved by an off-line compilation process and a run-time monitoring component that together generate a utility maximizing behavior. The crucial meta-level knowledge is kept in the anytime library in the form of conditional performance profiles. These profiles characterize the performance of each elementary anytime algorithm as a function of run-time and input quality. The compilation process therefore extends the principles of procedural abstraction and modularity to anytime computation. Its efficiency is significantly improved by using local compilation that works on a single program structure at a time. Local compilation is proved to yield global optimality for a large set of program structures.
Compilation produces contract algorithms which require the determination of the total run-time when activated. Some real-time domains require interruptible algorithms whose total run-time is unknown in advance. An important result of this work is a general method by which an interruptible algorithm can be constructed once a contract algorithm is compiled. Finally, the notion of gradual improvement of quality is extended to sensing and plan execution and the application of the model is demonstrated through a simulated robot navigation system. The result is a modular approach for developing real-time agents that act by performing anytime actions and make decisions using anytime computation.},
keywords = {},
pubstate = {published},
tppubtype = {phdthesis}
}
An important and largely ignored aspect of real-time decision making is the capability of agents to factor the cost of deliberation into the decision making process. I have developed an efficient model that creates this capability. The model uses as basic components anytime algorithms whose quality of results improves gradually as computation time increases. The main contribution of this work is a compilation process that extends the property of gradual improvement from the level of single algorithms to the level of complex systems.
In standard algorithms, the fixed quality of the output allows for composition to be implemented by a simple call-return mechanism. However, when algorithms have resource allocation as a degree of freedom, there arises the question of how to construct, for example, the optimal composition of two anytime algorithms, one of which feeds its output to the other. This scheduling problem is solved by an off-line compilation process and a run-time monitoring component that together generate a utility maximizing behavior. The crucial meta-level knowledge is kept in the anytime library in the form of conditional performance profiles. These profiles characterize the performance of each elementary anytime algorithm as a function of run-time and input quality. The compilation process therefore extends the principles of procedural abstraction and modularity to anytime computation. Its efficiency is significantly improved by using local compilation that works on a single program structure at a time. Local compilation is proved to yield global optimality for a large set of program structures.
Compilation produces contract algorithms which require the determination of the total run-time when activated. Some real-time domains require interruptible algorithms whose total run-time is unknown in advance. An important result of this work is a general method by which an interruptible algorithm can be constructed once a contract algorithm is compiled. Finally, the notion of gradual improvement of quality is extended to sensing and plan execution and the application of the model is demonstrated through a simulated robot navigation system. The result is a modular approach for developing real-time agents that act by performing anytime actions and make decisions using anytime computation. |
Shlomo Zilberstein; Stuart J Russell Anytime Sensing, Planning and Action: A Practical Model for Robot Control Conference Proceedings of the 13th International Joint Conference on Artificial Intelligence (IJCAI), Chambery, France, 1993. @conference{SZ:ZRijcai93,
title = {Anytime Sensing, Planning and Action: A Practical Model for Robot Control},
author = {Shlomo Zilberstein and Stuart J Russell},
url = {http://rbr.cs.umass.edu/shlomo/papers/ZRijcai93.pdf},
year = {1993},
date = {1993-01-01},
booktitle = {Proceedings of the 13th International Joint Conference on Artificial Intelligence (IJCAI)},
pages = {1402--1407},
address = {Chambery, France},
abstract = {Anytime algorithms, whose quality of results improves gradually as computation time increases, provide useful performance components for time-critical planning and control of robotic systems. In earlier work, we introduced a compilation scheme for optimal composition of anytime algorithms. In this paper we present an implementation of a navigation system in which an off-line compilation process and a run-time monitoring component guarantee the optimal allocation of time to the anytime modules. The crucial meta-level knowledge is kept in the anytime library in the form of conditional performance profiles. We also extend the notion of gradual improvement to sensing and plan execution. The result is an efficient, flexible control for robotic systems that exploits the tradeoff between time and quality in planning, sensing and plan execution.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Anytime algorithms, whose quality of results improves gradually as computation time increases, provide useful performance components for time-critical planning and control of robotic systems. In earlier work, we introduced a compilation scheme for optimal composition of anytime algorithms. In this paper we present an implementation of a navigation system in which an off-line compilation process and a run-time monitoring component guarantee the optimal allocation of time to the anytime modules. The crucial meta-level knowledge is kept in the anytime library in the form of conditional performance profiles. We also extend the notion of gradual improvement to sensing and plan execution. The result is an efficient, flexible control for robotic systems that exploits the tradeoff between time and quality in planning, sensing and plan execution. |
1992
|
Shlomo Zilberstein; Stuart J Russell Efficient Resource-Bounded Reasoning in AT-RALPH Conference Proceedings of the 1st International Conference on AI Planning Systems (AIPS), College Park, Maryland, 1992. @conference{SZ:ZRaips92,
title = {Efficient Resource-Bounded Reasoning in AT-RALPH},
author = {Shlomo Zilberstein and Stuart J Russell},
url = {http://rbr.cs.umass.edu/shlomo/papers/ZRaips92.pdf},
year = {1992},
date = {1992-01-01},
booktitle = {Proceedings of the 1st International Conference on AI Planning Systems (AIPS)},
pages = {260--266},
address = {College Park, Maryland},
abstract = {Anytime algorithms have attracted growing attention in recent years as a key mechanism for implementing models of bounded rationality. The main problem, however, as with planning systems in general, is the integration of the modules and their interface with the other components of the system. We have implemented a prototype of AT-RALPH (Anytime Rational Agent with Limited Performance Hardware) in which an off-line compilation process together with a run-time monitoring component guarantee the optimal allocation of time to the anytime algorithms. The crucial meta-level knowledge is kept in the anytime library in the form of conditional performance profiles. These are extensions of an earlier notion of performance description -- they characterize the performance of each elementary anytime algorithm as a function of run-time and input quality. This information, used by the compiler to produce the performance profile of the complete system, is also used by the run-time system to measure the value of computation and monitor the execution of the top-level procedure in the context of a particular domain. The result is an efficient and cheap meta-level control for real-time decision making that separates the performance components from the schedule optimization mechanism and automates the second task.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Anytime algorithms have attracted growing attention in recent years as a key mechanism for implementing models of bounded rationality. The main problem, however, as with planning systems in general, is the integration of the modules and their interface with the other components of the system. We have implemented a prototype of AT-RALPH (Anytime Rational Agent with Limited Performance Hardware) in which an off-line compilation process together with a run-time monitoring component guarantee the optimal allocation of time to the anytime algorithms. The crucial meta-level knowledge is kept in the anytime library in the form of conditional performance profiles. These are extensions of an earlier notion of performance description -- they characterize the performance of each elementary anytime algorithm as a function of run-time and input quality. This information, used by the compiler to produce the performance profile of the complete system, is also used by the run-time system to measure the value of computation and monitor the execution of the top-level procedure in the context of a particular domain. The result is an efficient and cheap meta-level control for real-time decision making that separates the performance components from the schedule optimization mechanism and automates the second task. |
1991
|
Stuart J Russell; Shlomo Zilberstein Composing Real-Time Systems Conference Proceedings of the 12th International Joint Conference on Artificial Intelligence (IJCAI), Sydney, Australia, 1991. @conference{SZ:RZijcai91,
title = {Composing Real-Time Systems},
author = {Stuart J Russell and Shlomo Zilberstein},
url = {http://rbr.cs.umass.edu/shlomo/papers/RZijcai91.pdf},
year = {1991},
date = {1991-01-01},
booktitle = {Proceedings of the 12th International Joint Conference on Artificial Intelligence (IJCAI)},
pages = {212--217},
address = {Sydney, Australia},
abstract = {We present a method to construct real-time systems using as components anytime algorithms whose quality of results degrades gracefully as computation time decreases. Introducing computation time as a degree of freedom defines a scheduling problem involving the activation and interruption of the anytime components. This scheduling problem is especially complicated when trying to construct interruptible algorithms, whose total run-time is unknown in advance. We introduce a framework to measure the performance of anytime algorithms and solve the problem of constructing interruptible algorithms by a mathematical reduction to the problem of constructing contract algorithms, which require the determination of the total run-time when activated. We show how the composition of anytime algorithms can be mechanized as part of a compiler for a LISP-like programming language for real-time systems. The result is a new approach to the construction of complex real-time systems that separates the arrangement of the performance components from the optimization of their scheduling, and automates the latter task.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
We present a method to construct real-time systems using as components anytime algorithms whose quality of results degrades gracefully as computation time decreases. Introducing computation time as a degree of freedom defines a scheduling problem involving the activation and interruption of the anytime components. This scheduling problem is especially complicated when trying to construct interruptible algorithms, whose total run-time is unknown in advance. We introduce a framework to measure the performance of anytime algorithms and solve the problem of constructing interruptible algorithms by a mathematical reduction to the problem of constructing contract algorithms, which require the determination of the total run-time when activated. We show how the composition of anytime algorithms can be mechanized as part of a compiler for a LISP-like programming language for real-time systems. The result is a new approach to the construction of complex real-time systems that separates the arrangement of the performance components from the optimization of their scheduling, and automates the latter task. |