{"id":236,"date":"2022-04-05T18:57:00","date_gmt":"2022-04-05T18:57:00","guid":{"rendered":"https:\/\/groups.cs.umass.edu\/equate-ml\/?p=236"},"modified":"2022-04-07T18:15:21","modified_gmt":"2022-04-07T18:15:21","slug":"paper-universal-off-policy-evaluation","status":"publish","type":"post","link":"https:\/\/groups.cs.umass.edu\/equate-ml\/2022\/04\/05\/paper-universal-off-policy-evaluation\/","title":{"rendered":"Paper: Universal Off-Policy Evaluation"},"content":{"rendered":"\n<p>When faced with sequential decision-making problems, it is often useful to be able to predict what would happen if decisions were made using a new policy. Those predictions must often be based on data collected under some previously used decision-making rule. Many previous methods enable such off-policy (or counterfactual) estimation of the expected value of a performance measure called the return. In this paper, we take the first steps towards a universal off-policy estimator (UnO) \u2014 one that provides off-policy estimates and high-confidence bounds for any parameter of the return distribution. We use UnO for estimating and simultaneously bounding the mean, variance, quantiles\/median, inter-quantile range, CVaR, and the entire cumulative distribution of returns. Finally, we also discuss Uno\u2019s applicability in various settings, including fully observable, partially observable (i.e., with unobserved confounders), Markovian, non-Markovian, stationary, smoothly non-stationary, and discrete distribution shifts.<\/p>\n\n\n\n<div class=\"wp-block-buttons is-layout-flex wp-block-buttons-is-layout-flex\">\n<div class=\"wp-block-button\"><a class=\"wp-block-button__link\" href=\"https:\/\/arxiv.org\/abs\/2104.12820\">Paper<\/a><\/div>\n<\/div>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>When faced with sequential decision-making problems, it is often useful to be able to predict what would happen if decisions were made using a new policy. Those predictions must often be based on data collected under some previously used decision-making rule. Many previous methods enable such off-policy (or counterfactual) estimation of the expected value of &hellip; <a href=\"https:\/\/groups.cs.umass.edu\/equate-ml\/2022\/04\/05\/paper-universal-off-policy-evaluation\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Paper: Universal Off-Policy Evaluation&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":175,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4],"tags":[32,34,26],"class_list":["post-236","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-ml","tag-32","tag-neurips","tag-paper","group-blog","hfeed"],"_links":{"self":[{"href":"https:\/\/groups.cs.umass.edu\/equate-ml\/wp-json\/wp\/v2\/posts\/236","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/groups.cs.umass.edu\/equate-ml\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/groups.cs.umass.edu\/equate-ml\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/groups.cs.umass.edu\/equate-ml\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/groups.cs.umass.edu\/equate-ml\/wp-json\/wp\/v2\/comments?post=236"}],"version-history":[{"count":1,"href":"https:\/\/groups.cs.umass.edu\/equate-ml\/wp-json\/wp\/v2\/posts\/236\/revisions"}],"predecessor-version":[{"id":237,"href":"https:\/\/groups.cs.umass.edu\/equate-ml\/wp-json\/wp\/v2\/posts\/236\/revisions\/237"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/groups.cs.umass.edu\/equate-ml\/wp-json\/wp\/v2\/media\/175"}],"wp:attachment":[{"href":"https:\/\/groups.cs.umass.edu\/equate-ml\/wp-json\/wp\/v2\/media?parent=236"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/groups.cs.umass.edu\/equate-ml\/wp-json\/wp\/v2\/categories?post=236"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/groups.cs.umass.edu\/equate-ml\/wp-json\/wp\/v2\/tags?post=236"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}