Publications – Knowledge Discovery Lab

Terrance E. Boult, Przemyslaw A. Grabowicz, D. S. Prijatelj, R. Stern, L. Holder, J. Alspector, M. Jafarzadeh, T. Ahmad, A. R. Dhamija, C. Li, S. Cruz, A. Shrivastava, C. Vondrick, W. J. Scheirer

A Unifying Framework for Formal Theories of Novelty:Framework, Examples and Discussion Proceedings Article

In: AAAI'21 SMPT, 2021, ISSN: 23318422.

Abstract | Links | BibTeX

Aarshee Mishra, Przemyslaw A. Grabowicz, Nicholas Perello

Towards Fair and Explainable Supervised Learning Proceedings Article

In: ICML Workshop on Socially Responsible Machine Learning, 2021.

Abstract | Links | BibTeX

Amanda M Gentzel, Purva Pruthi, David Jensen

How and Why to Use Experimental Data to Evaluate Methods for Observational Causal Inference Proceedings Article

In: International Conference on Machine Learning, pp. 3660–3671, PMLR 2021.

Abstract | Links | BibTeX

David Jensen

Improving Causal Inference by Increasing Model Expressiveness Proceedings Article

In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 15053–15057, 2021.

Abstract | Links | BibTeX

Akanksha Atrey, Prashant J. Shenoy, David Jensen

Preserving Privacy in Personalized Models for Distributed Mobile Services Miscellaneous

2021.

Abstract | Links | BibTeX

Sam Witty, David Jensen, Vikash Mansinghka

A Simulation-Based Test of Identifiability for Bayesian Causal Inference Miscellaneous

2021.

Abstract | Links | BibTeX

David Ifeoluwa Adelani, Ryota Kobayashi, Ingmar Weber, Przemyslaw A. Grabowicz

Estimating community feedback effect on topic choice in social media with predictive modeling Journal Article

In: EPJ Data Science, vol. 9, no. 1, pp. 25, 2020, ISSN: 2193-1127.

Abstract | Links | BibTeX

David Jensen, Javier Burroni, Matthew Rattigan

Object conditioning for causal inference Proceedings Article

In: Uncertainty in Artificial Intelligence, pp. 1072–1082, PMLR 2020.

Abstract | Links | BibTeX

Akanksha Atrey, Kaleigh Clary, David Jensen

Exploratory Not Explanatory: Counterfactual Analysis of Saliency Maps for Deep Reinforcement Learning Proceedings Article

In: International Conference on Learning Representations, 2020.

Abstract | Links | BibTeX

Katherine A. Keith, David Jensen, Brendan O'Connor

Text and Causal Inference: A Review of Using Text to Remove Confounding from Causal Estimates Proceedings Article

In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, pp. 5332–5344, Association for Computational Linguistics, 2020.

Abstract | Links | BibTeX

Sam Witty, Kenta Takatsu, David Jensen, Vikash Mansinghka

Causal Inference using Gaussian Processes with Structured Latent Confounders Proceedings Article

In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, pp. 10313–10323, PMLR, 2020.

Abstract | Links | BibTeX

Amanda Gentzel, Justin Clarke, David Jensen

Using Experimental Data to Evaluate Methods for Observational Causal Inference Miscellaneous

2020.

Abstract | Links | BibTeX

Przemyslaw A. Grabowicz, Nicholas Perello, Kenta Takatsu

Resilience of Supervised Learning Algorithms to Discriminatory Data Perturbations Journal Article

In: 2019.

Abstract | Links | BibTeX

David Jensen, others

Comment: Strengthening empirical evaluation of causal inference methods Journal Article

In: Statistical Science, vol. 34, no. 1, pp. 77–81, 2019.

Abstract | Links | BibTeX

Emma Tosch, Eytan Bakshy, Emery D Berger, David Jensen, J Eliot B Moss

PlanAlyzer: assessing threats to the validity of online experiments Journal Article

In: Proceedings of the ACM on Programming Languages, vol. 3, no. OOPSLA, pp. 1–30, 2019.

Abstract | Links | BibTeX

@article{tosch2019planalyzer,

title = {PlanAlyzer: assessing threats to the validity of online experiments},

author = {Emma Tosch and Eytan Bakshy and Emery D Berger and David Jensen and J Eliot B Moss},

url = {https://dl.acm.org/doi/pdf/10.1145/3360608},

year  = {2019},

date = {2019-01-01},

journal = {Proceedings of the ACM on Programming Languages},

volume = {3},

number = {OOPSLA},

pages = {1--30},

publisher = {ACM New York, NY, USA},

abstract = {Online experiments are ubiquitous. As the scale of experiments has grown, so has the complexity of their design and implementation. In response, firms have developed software frameworks for designing and deploying online experiments. Ensuring that experiments in these frameworks are correctly designed and that their results are trustworthy---referred to as *internal validity*---can be difficult. Currently, verifying internal validity requires manual inspection by someone with substantial expertise in experimental design. We present the first approach for statically checking the internal validity of online experiments. Our checks are based on well-known problems that arise in experimental design and causal inference. Our analyses target PlanOut, a widely deployed, open-source experimentation framework that uses a domain-specific language to specify and run complex experiments. We have built a tool, PlanAlyzer, that checks PlanOut programs for a variety of threats to internal validity, including failures of randomization, treatment assignment, and causal sufficiency. PlanAlyzer uses its analyses to automatically generate *contrasts*, a key type of information required to perform valid statistical analyses over experimental results. We demonstrate PlanAlyzer's utility on a corpus of PlanOut scripts deployed in production at Facebook, and we evaluate its ability to identify threats to validity on a mutated subset of this corpus. PlanAlyzer has both precision and recall of 92% on the mutated corpus, and 82% of the contrasts it automatically generates match hand-specified data.},

keywords = {},

pubstate = {published},

tppubtype = {article}

}

Close

Huseyin Oktay, Akanksha Atrey, David Jensen

Identifying when effect restoration will improve estimates of causal effect Proceedings Article

In: Proceedings of the 2019 SIAM International Conference on Data Mining, pp. 190–198, Society for Industrial and Applied Mathematics 2019.

Abstract | Links | BibTeX

Amanda Gentzel, Dan Garant, David Jensen

The Case for Evaluating Causal Models Using Interventional Measures and Empirical Data Proceedings Article

In: Advances in Neural Information Processing Systems, Curran Associates, Inc., 2019.

Abstract | Links | BibTeX

Emma Tosch, Kaleigh Clary, John Foley, David Jensen

Toybox: A Suite of Environments for Experimental Evaluation of Deep Reinforcement Learning Miscellaneous

2019.

Abstract | Links | BibTeX

Sam Witty, Alexander Lew, David Jensen, Vikash Mansinghka

Bayesian causal inference via probabilistic program synthesis Miscellaneous

2019.

Abstract | Links | BibTeX

John Foley, Emma Tosch, Kaleigh Clary, David Jensen

Toybox: Better Atari Environments for Testing Reinforcement Learning Agents Proceedings Article

In: NeurIPS 2018 Workshop on Systems for ML, 2018.

Abstract | Links | BibTeX

Sam Witty, David Jensen

Causal Graphs vs. Causal Programs: The Case of Conditional Branching Proceedings Article

In: First Conference on Probabilistic Programming (ProbProg), 2018.

Abstract | Links | BibTeX

Sam Witty, Jun Ki Lee, Emma Tosch, Akanksha Atrey, Michael Littman, David Jensen

Measuring and characterizing generalization in deep reinforcement learning Miscellaneous

2018.

Abstract | Links | BibTeX

Kaleigh Clary, Emma Tosch, John Foley, David Jensen

Let's Play Again: Variability of Deep Reinforcement Learning Agents in Atari Environments Miscellaneous

2018.

Abstract | Links | BibTeX

Kaleigh Clary, David Jensen

A/B Testing in Networks with Adversarial Members Journal Article

In: 2017.

Abstract | Links | BibTeX

Javier Burroni, Arjun Guha, David Jensen

INTERACTIVE WRITING AND DEBUGGING OF BAYESIAN PROBABILISTIC PROGRAMS Journal Article

In: 2017.

Links | BibTeX

Katerina Marazopoulou, David Arbour, David Jensen

On causal analysis for heterogeneous networks Proceedings Article

In: The 2017 ACM SIGKDD Workshop on Causal Discovery, 2017.

Links | BibTeX

Kaleigh Clary, Andrew McGregor, David Jensen

A/B Testing in Networks with Adversarial Nodes Proceedings Article

In: KDD Workshop on Mining and Learning with Graphs, 2017.

BibTeX

David Arbour, Dan Garant, David Jensen

Inferring Network Effects from Observational Data Proceedings Article

In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13-17, 2016, pp. 715–724, ACM, 2016.

Abstract | Links | BibTeX

David Arbour, Katerina Marazopoulou, David Jensen

Inferring Causal Direction from Relational Data Proceedings Article

In: Proceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence, UAI 2016, June 25-29, 2016, New York City, NY, USA, AUAI Press, 2016.

Abstract | Links | BibTeX

Shiri Dori-Hacohen, David Jensen, James Allan

Controversy Detection in Wikipedia Using Collective Classification Proceedings Article

In: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, SIGIR 2016, Pisa, Italy, July 17-21, 2016, pp. 797–800, ACM, 2016.

Abstract | Links | BibTeX

Katerina Marazopoulou, Rumi Ghosh, Prasanth Lade, David Jensen

Causal Discovery for Manufacturing Domains Miscellaneous

2016.

Abstract | Links | BibTeX

Dan Garant, David Jensen

Evaluating causal models by comparing interventional distributions Miscellaneous

2016.

Abstract | Links | BibTeX

Phillip B. Kirlin, David Jensen

Learning to Uncover Deep Musical Structure Proceedings Article

In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, January 25-30, 2015, Austin, Texas, USA, pp. 1770–1776, AAAI Press, 2015.

Abstract | Links | BibTeX

Jerod J. Weinman, David Jensen, David Lopatto

Teaching Computing as Science in a Research Experience Proceedings Article

In: Proceedings of the 46th ACM Technical Symposium on Computer Science Education, SIGCSE 2015, Kansas City, MO, USA, March 4-7, 2015, pp. 24–29, ACM, 2015.

Abstract | Links | BibTeX

Katerina Marazopoulou, Marc Maier, David Jensen

Learning the Structure of Causal Models with Relational and Temporal Dependence Proceedings Article

In: Proceedings of the UAI 2015 Workshop on Advances in Causal Inference co-located with the 31st Conference on Uncertainty in Artificial Intelligence (UAI 2015), Amsterdam, The Netherlands, July 16, 2015, pp. 66–75, CEUR-WS.org, 2015.

Abstract | Links | BibTeX

Lisa Friedland, Amanda Gentzel, David Jensen

Classifier-adjusted density estimation for anomaly detection and one-class classification Proceedings Article

In: Proceedings of the 2014 SIAM International Conference on Data Mining, pp. 578–586, Society for Industrial and Applied Mathematics 2014.

Abstract | Links | BibTeX

David Arbour, Katerina Marazopoulou, Dan Garant, David Jensen

Propensity Score Matching for Causal Inference with Relational Data Proceedings Article

In: Proceedings of the UAI 2014 Workshop Causal Inference: Learning and Prediction co-located with 30th Conference on Uncertainty in Artificial Intelligence (UAI 2014), Quebec City, Canada, July 27, 2014, pp. 25–34, CEUR-WS.org, 2014.

Abstract | Links | BibTeX

Katerina Marazopoulou, David Arbour, David Jensen

Refining the Semantics of Social Influence Miscellaneous

2014.

Abstract | Links | BibTeX

David Arbour, James Atwood, Ahmed El-Kishky, David Jensen

Agglomerative Clustering of Bagged Data Using Joint Distributions Journal Article

In: 2013.

Abstract | Links | BibTeX

Lisa Friedland, David Jensen, Michael Lavine

Copy or Coincidence? A Model for Detecting Social Influence and Duplication Events Proceedings Article

In: Proceedings of the 30th International Conference on Machine Learning, ICML 2013, Atlanta, GA, USA, 16-21 June 2013, pp. 1175–1183, JMLR.org, 2013.

Abstract | Links | BibTeX

Marc Maier, Katerina Marazopoulou, David Arbour, David Jensen

A Sound and Complete Algorithm for Learning Causal Models from Relational Data Proceedings Article

In: Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence, UAI 2013, Bellevue, WA, USA, August 11-15, 2013, AUAI Press, 2013.

Abstract | Links | BibTeX

Marc Maier, Katerina Marazopoulou, David Arbour, David Jensen

Flattening network data for causal discovery: What could go wrong? Proceedings Article

In: Workshop on Information in Networks, 2013.

Abstract | Links | BibTeX

@inproceedings{maier2013flattening,

title = {Flattening network data for causal discovery: What could go wrong?},

author = {Marc Maier and Katerina Marazopoulou and David Arbour and David Jensen},

url = {https://www.semanticscholar.org/paper/Flattening-network-data-for-causal-discovery-%3A-What-Maier-Marazopoulou/c327100636c022c259f5e1bf2d7fcbbd0b048935},

year  = {2013},

date = {2013-01-01},

booktitle = {Workshop on Information in Networks},

volume = {64},

abstract = {Methods for learning causal dependencies from observational data have been the focus of decades of work in social science, statistics, machine learning, and philosophy [9, 10, 11]. Much of the theoretical and practical work on causal discovery has focused on propositional representations. Propositional models effectively represent individual directed causal dependencies (e.g., path analysis, Bayesian networks) or conditional distributions of some outcome variable (e.g., linear regression, decision trees). However, propositional representations are limited to modeling independent and identically distributed (IID) data of a single entity type. Many real-world systems involve heterogeneous, interacting entities with probabilistic dependencies that cross the boundaries of those entities (i.e., non-IID data with multiple entity types and relationships). These systems produce network, or relational, data, and they are of paramount interest to researchers and practitioners across a wide range of disciplines. To model such data, researchers in statistics and computer science have devised more expressive classes of directed graphical models, such as probabilistic relational models (PRMs) [2] and directed acyclic probabilistic entityrelationship (DAPER) models [4]. Despite the assumptions embedded in propositional models, a common practice is to flatten, or propositionalize, relational data and use existing algorithms [5] (see Figure 1, focusing on algorithms that learn causal graphical models). While there are statistical concerns, this process is generally innocuous if the task is to model statistical associations for predictive inference. In contrast, to learn causal structure, estimate causal effects, or support inference over interventions, the effects of flattening inherently relational data can be particularly deleterious. In this paper, we identify four classes of potential issues that can occur with a propositionalization strategy as opposed to embracing a more expressive representation that would not succumb to these problems. We also present empirical results comparing the effectiveness of two theoretically sound and complete algorithms that learn causal structure: PC—a widely used constraint-based, propositional algorithm for causal discovery [11], and RCD—a recently developed constraint-based algorithm that reasons over a relational representation [6].},

keywords = {},

pubstate = {published},

tppubtype = {inproceedings}

}

Close

Methods for learning causal dependencies from observational data have been the focus of decades of work in social science, statistics, machine learning, and philosophy [9, 10, 11]. Much of the theoretical and practical work on causal discovery has focused on propositional representations. Propositional models effectively represent individual directed causal dependencies (e.g., path analysis, Bayesian networks) or conditional distributions of some outcome variable (e.g., linear regression, decision trees). However, propositional representations are limited to modeling independent and identically distributed (IID) data of a single entity type. Many real-world systems involve heterogeneous, interacting entities with probabilistic dependencies that cross the boundaries of those entities (i.e., non-IID data with multiple entity types and relationships). These systems produce network, or relational, data, and they are of paramount interest to researchers and practitioners across a wide range of disciplines. To model such data, researchers in statistics and computer science have devised more expressive classes of directed graphical models, such as probabilistic relational models (PRMs) [2] and directed acyclic probabilistic entityrelationship (DAPER) models [4]. Despite the assumptions embedded in propositional models, a common practice is to flatten, or propositionalize, relational data and use existing algorithms [5] (see Figure 1, focusing on algorithms that learn causal graphical models). While there are statistical concerns, this process is generally innocuous if the task is to model statistical associations for predictive inference. In contrast, to learn causal structure, estimate causal effects, or support inference over interventions, the effects of flattening inherently relational data can be particularly deleterious. In this paper, we identify four classes of potential issues that can occur with a propositionalization strategy as opposed to embracing a more expressive representation that would not succumb to these problems. We also present empirical results comparing the effectiveness of two theoretically sound and complete algorithms that learn causal structure: PC—a widely used constraint-based, propositional algorithm for causal discovery [11], and RCD—a recently developed constraint-based algorithm that reasons over a relational representation [6].

Close

Marc Maier, Katerina Marazopoulou, David Jensen

Reasoning about Independence in Probabilistic Models of Relational Data Miscellaneous

2013.

Abstract | Links | BibTeX

Matthew Rattigan

Leveraging Relational Representations for Causal Discovery PhD Thesis

2012, ISBN: 9781267786821, (AAI3545976).

Abstract | BibTeX

@phdthesis{10.5555/2520420,

title = {Leveraging Relational Representations for Causal Discovery},

author = {Matthew Rattigan},

isbn = {9781267786821},

year  = {2012},

date = {2012-01-01},

publisher = {University of Massachusetts Amherst},

abstract = {This thesis represents a synthesis of relational learning and causal discovery, two subjects at the frontier of machine learning research. Relational learning investigates algorithms for constructing statistical models of data drawn from of multiple types of interrelated entities, and causal discovery investigates algorithms for constructing causal models from observational data. My work demonstrates that there exists a natural, methodological synergy between these two areas of study, and that despite the sometimes onerous nature of each, their combination (perhaps counterintuitively) can provide advances in the state of the art for both. Traditionally, propositional (or "flat") data representations have dominated the statistical sciences. These representations assume that data consist of independent and identically distributed (iid) entities which can be represented by a single data table. More recently, data scientists have increasingly focused on "relational" data sets that consist of interrelated, heterogeneous entities. However, relational learning and causal discovery are rarely combined. Relational representations are wholly absent from the literature where causality is discussed explicitly. Instead, the literature on causality that uses the framework of graphical models assumes that data are independent and identically distributed. This unexplored topical intersection represents an opportunity for advancement — by combining relational learning with causal reasoning, we can provide insight into the challenges found in each subject area. By adopting a causal viewpoint, we can clarify the mechanisms that produce previously identified pathologies in relational learning. Analogously, we can utilize relational data to establish and strengthen causal claims in ways that are impossible using only propositional representations.},

note = {AAI3545976},

keywords = {},

pubstate = {published},

tppubtype = {phdthesis}

}

Close

Marc Maier, Matthew Rattigan, David Jensen

Indexing Network Structure with Shortest-Path Trees Journal Article

In: ACM Trans. Knowl. Discov. Data, vol. 5, no. 3, 2011, ISSN: 1556-4681.

Abstract | Links | BibTeX