Publications
2013 |
Marlin, Benjamin M; Adams, Roy J; Sadasivam, Rajani; Houston, Thomas K: Towards Collaborative Filtering Recommender Systems for Tailored Health Communications. 2013. (Type: Proceedings | Abstract | Links | BibTeX)@proceedings{101, <p>The goal of computer tailored health communications (CTHC) is to promote healthy behaviors by sending messages tailored to individual patients. Current CTHC systems collect baseline patient textquotedblleftprofilestextquotedblright and then use expert-written, rule-based systems to target messages to subsets of patients. Our main interest in this work is the study of collaborative filtering-based CTHC systems that can learn to tailor future message selections to individual patients based explicit feedback about past message selections. This paper reports the results of a study designed to collect explicit feedback (ratings) regarding four aspects of messages from 100 subjects in the smoking cessation support domain. Our results show that most users have positive opinions of most messages and that the ratings for all four aspects of the messages are highly correlated with each other. Finally, we conduct a range of rating prediction experiments comparing several different model variations. Our results show that predicting future ratings based on each usertextquoterights past ratings contributes the most to predictive accuracy.</p> |
2012 |
Khan, Mohammad Emtiyaz; Mohamed, Shakir; Marlin, Benjamin M; Murphy, Kevin P: A Stick-Breaking Likelihood for Categorical Data Analysis with Latent Gaussian Models. AISTATS, 2012. (Type: Conference | Abstract | Links | BibTeX)@conference{DBLP:journals/jmlr/KhanMMM12, <p>The development of accurate models and efficient algorithms for the analysis of multivariate categorical data are important and long-standing problems in machine learning and computational statistics. In this paper, we focus on modeling categorical data using Latent Gaussian Models (LGMs). We propose a novel stick-breaking likelihood function for categorical LGMs that exploits accurate linear and quadratic bounds on the logistic log-partition function, leading to an effective variational inference and learning framework. We thoroughly compare our approach to existing algorithms for multinomial logit/probit likelihoods on several problems, including inference in multinomial Gaussian process classification and learning in latent factor models. Our extensive comparisons demonstrate that our stick-breaking model effectively captures correlation in discrete data and is well suited for the analysis of categorical data.</p> |
Marlin, Benjamin M; Kale, David C; Khemani, Robinder G; Wetzel, Randall C: Unsupervised pattern discovery in electronic health care data using probabilistic clustering models. IHI, 2012. (Type: Conference | Abstract | Links | BibTeX)@conference{DBLP:conf/ihi/MarlinKKW12, <p>Bedside clinicians routinely identify temporal patterns in physiologic data in the process of choosing and administering treatments intended to alter the course of critical illness for individual patients. Our primary interest is the study of unsupervised learning techniques for automatically uncovering such patterns from the physiologic time series data contained in electronic health care records. This data is sparse, high-dimensional and often both uncertain and incomplete. In this paper, we develop and study a probabilistic clustering model designed to mitigate the effects of temporal sparsity inherent in electronic health care records data. We evaluate the model qualitatively by visualizing the learned cluster parameters and quantitatively in terms of its ability to predict mortality outcomes associated with patient episodes. Our results indicate that the model can discover distinct, recognizable physiologic patterns with prognostic significance.</p> |
2011 |
Marlin, Benjamin M; de Freitas, Nando: Asymptotic Efficiency of Deterministic Estimators for Discrete Energy-Based Models: Ratio Matching and Pseudolikelihood. UAI, 2011. (Type: Conference | Abstract | Links | BibTeX)@conference{DBLP:conf/uai/MarlinF11, <p>Standard maximum likelihood estimation cannot be applied to discrete energy-based models in the general case because the computation of exact model probabilities is intractable. Recent research has seen the proposal of several new estimators designed specifically to overcome this intractability, but virtually nothing is known about their theoretical properties. In this paper, we present a generalized estimator that unifies many of the classical and recently proposed estimators. We use results from the standard asymptotic theory for M-estimators to derive a generic expression for the asymptotic covariance matrix of our generalized estimator. We apply these results to study the relative statistical efficiency of classical pseudolikelihood and the recently-proposed ratio matching estimator. </p> |
Duvenaud, David K; Marlin, Benjamin M; Murphy, Kevin P: Multiscale Conditional Random Fields for Semi-supervised Labeling and Classification. CRV, 2011. (Type: Conference | Abstract | Links | BibTeX)@conference{DBLP:conf/crv/DuvenaudMM11, <p>Motivated by the abundance of images labeled only by their captions, we construct tree-structured multiscale conditional random fields capable of performing semi supervised learning. We show that such caption-only data can in fact increase pixel-level accuracy at test time. In addition, we compare two kinds of tree: the standard one with pair wise potentials, and one based on noisy-or potentials, which better matches the semantics of the recursive partitioning used to create the tree.</p> |
Swersky, Kevin; textquoteright, Marc; Buchman, David; Marlin, Benjamin M; de Freitas, Nando: On Autoencoders and Score Matching for Energy Based Models. ICML, 2011. (Type: Conference | Abstract | Links | BibTeX)@conference{DBLP:conf/icml/SwerskyRBMF11, <p>We consider estimation methods for the class of continuous-data energy based models (EBMs). Our main result shows that estimating the parameters of an EBM using score matching when the conditional distribution over the visible units is Gaussian corresponds to training a particular form of regularized autoencoder. We show how different Gaussian EBMs lead to different autoencoder architectures, providing deep links between these two families of models. We compare the score matching estimator for the mPoT model, a particular Gaussian EBM, to several other training methods on a variety of tasks including image denoising and unsupervised feature extraction. We show that the regularization function induced by score matching leads to superior classification performance relative to a standard autoencoder. We also show that score matching yields classification results that are indistinguishable from better-known stochastic approximation maximum likelihood estimators.</p> |
Marlin, Benjamin M; Khan, Mohammad Emtiyaz; Murphy, Kevin P: Piecewise Bounds for Estimating Bernoulli-Logistic Latent Gaussian Models. ICML, 2011. (Type: Conference | Abstract | Links | BibTeX)@conference{DBLP:conf/icml/MarlinKM11, <p>Bernoulli-logistic latent Gaussian models (bLGMs) are a useful model class, but accurate parameter estimation is complicated by the fact that the marginal likelihood contains an intractable logistic-Gaussian integral. In this work, we propose the use of fixed piecewise linear and quadratic upper bounds to the logistic-log-partition (LLP) function as a way of circumventing this intractable integral. We describe a framework for approximately computing minimax optimal piecewise quadratic bounds, as well a generalized expectation maximization algorithm based on using piecewise bounds to estimate bLGMs. We prove a theoretical result relating the maximum error in the LLP bound to the maximum error in the marginal likelihood estimate. Finally, we present empirical results showing that piecewise bounds can be significantly more accurate than previously proposed variational bounds. </p> |
Marlin, Benjamin M; Zemel, Richard S; Roweis, Sam T; Slaney, Malcolm: Recommender Systems, Missing Data and Statistical Model Estimation. IJCAI, 2011. (Type: Conference | Abstract | Links | BibTeX)@conference{DBLP:conf/ijcai/MarlinZRS11, <p>The goal of rating-based recommender systems is to make personalized predictions and recommendations for individual users by leveraging the preferences of a community of users with respect to a collection of items like songs or movies. Recommender systems are often based on intricate statistical models that are estimated from data sets containing a very high proportion of missing ratings. This work describes evidence of a basic incompatibility between the properties of recommender system data sets and the assumptions required for valid estimation and evaluation of statistical models in the presence of missing data. We discuss the implications of this problem and describe extended modelling and evaluation frameworks that attempt to circumvent it. We present prediction and ranking results showing that models developed and tested under these extended frameworks can significantly outperform standard models.</p> |
2010 |
Marlin, Benjamin M; Swersky, Kevin; Chen, Bo; de Freitas, Nando: Inductive Principles for Restricted Boltzmann Machine Learning. AISTATS, 2010. (Type: Conference | Abstract | Links | BibTeX)@conference{DBLP:journals/jmlr/MarlinSCF10, <p>Recent research has seen the proposal of several new inductive principles designed specifically to avoid the problems associated with maximum likelihood learning in models with intractable partition functions. In this paper, we study learning methods for binary restricted Boltzmann machines (RBMs) based on ratio matching and generalized score matching. We compare these new RBM learning methods to a range of existing learning methods including stochastic maximum likelihood, contrastive divergence, and pseudo-likelihood. We perform an extensive empirical evaluation across multiple tasks and data sets. </p> |
Swersky, Kevin; Chen, Bo; Marlin, Benjamin M; de Freitas, Nando: A tutorial on stochastic approximation algorithms for training Restricted Boltzmann Machines and Deep Belief Nets. ITA, 2010. (Type: Conference | Abstract | Links | BibTeX)@conference{DBLP:conf/ita/SwerskyCMF10, <p>In this study, we provide a direct comparison of the Stochastic Maximum Likelihood algorithm and Contrastive Divergence for training Restricted Boltzmann Machines using the MNIST data set. We demonstrate that Stochastic Maximum Likelihood is superior when using the Restricted Boltzmann Machine as a classifier, and that the algorithm can be greatly improved using the technique of iterate averaging from the field of stochastic approximation. We further show that training with optimal parameters for classification does not necessarily lead to optimal results when Restricted Boltzmann Machines are stacked to form a Deep Belief Network. In our experiments we observe that fine tuning a Deep Belief Network significantly changes the distribution of the latent data, even though the parameter changes are negligible.</p> |
Khan, Mohammad Emtiyaz; Marlin, Benjamin M; Bouchard, Guillaume; Murphy, Kevin P: Variational bounds for mixed-data factor analysis. NIPS, 2010. (Type: Conference | Abstract | Links | BibTeX)@conference{DBLP:conf/nips/KhanMBM10, <p>We propose a new variational EM algorithm for fitting factor analysis models with mixed continuous and categorical observations. The algorithm is based on a simple quadratic bound to the log-sum-exp function. In the special case of fully observed binary data, the bound we propose is significantly faster than previous variational methods. We show that EM is significantly more robust in the presence of missing data compared to treating the latent factors as parameters, which is the approach used by exponential family PCA and other related matrix-factorization methods. A further benefit of the variational approach is that it can easily be extended to the case of mixtures of factor analyzers, as we show. We present results on synthetic and real data sets demonstrating several desirable properties of our proposed method.</p> |
2009 |
Moghaddam, Baback; Marlin, Benjamin M; Khan, Mohammad Emtiyaz; Murphy, Kevin P: Accelerating Bayesian Structural Inference for Non-Decomposable Gaussian Graphical Models. NIPS, 2009. (Type: Conference | Abstract | Links | BibTeX)@conference{DBLP:conf/nips/MoghaddamMKM09, <p>In this paper we make several contributions towards accelerating approximate Bayesian structural inference for non-decomposable GGMs. Our first contribution is to show how to efficiently compute a BIC or Laplace approximation to the marginal likelihood of non-decomposable graphs using convex methods for precision matrix estimation. This optimization technique can be used as a fast scoring function inside standard Stochastic Local Search (SLS) for generating posterior samples. Our second contribution is a novel framework for efficiently generating large sets of high-quality graph topologies without performing local search. This graph proposal method, which we call "Neighborhood Fusion" (NF), samples candidate Markov blankets at each node using sparse regression techniques. Our final contribution is a hybrid method combining the complementary strengths of NF and SLS. Experimental results in structural recovery and prediction tasks demonstrate that NF and hybrid NF/SLS out-perform state-of-the-art local search methods, on both synthetic and real-world data sets, when realistic computational limits are imposed.</p> |
Marlin, Benjamin M; Zemel, Richard S: Collaborative prediction and ranking with non-random missing data. RecSys, 2009. (Type: Conference | Abstract | Links | BibTeX)@conference{DBLP:conf/recsys/MarlinZ09, <p>A fundamental aspect of rating-based recommender systems is the observation process, the process by which users choose the items they rate. Nearly all research on collaborative filtering and recommender systems is founded on the assumption that missing ratings are missing at random. The statistical theory of missing data shows that incorrect assumptions about missing data can lead to biased parameter estimation and prediction. In a recent study, we demonstrated strong evidence for violations of the missing at random condition in a real recommender system. In this paper we present the first study of the effect of non-random missing data on collaborative ranking, and extend our previous results regarding the impact of non-random missing data on collaborative prediction.</p> |
Marlin, Benjamin M; Schmidt, Mark W; Murphy, Kevin P: Group Sparse Priors for Covariance Estimation. UAI, 2009. (Type: Conference | Abstract | Links | BibTeX)@conference{DBLP:conf/uai/MarlinSM09, <p>Recently it has become popular to learn sparse Gaussian graphical models (GGMs) by imposing l1 or group l1,2 penalties on the elements of the precision matrix. This penalized likelihood approach results in a tractable convex optimization problem. In this paper, we reinterpret these results as performing MAP estimation under a novel prior which we call the group l1 and l1,2 positive definite matrix distributions. This enables us to build a hierarchical model in which the l1 regularization terms vary depending on which group the entries are assigned to, which in turn allows us to learn block structured sparse GGMs with unknown group assignments. Exact inference in this hierarchical model is intractable, due to the need to compute the normalization constant of these matrix distributions. However, we derive upper bounds on the partition functions, which lets us use fast variational inference (optimizing a lower bound on the joint posterior). We show that on two real world data sets (motion capture and financial data), our method which infers the block structure outperforms a method that uses a fixed block structure, which in turn outperforms baseline methods that ignore block structure. </p> |
Marlin, Benjamin M; Murphy, Kevin P: Sparse Gaussian graphical models with unknown block structure. ICML, 2009. (Type: Conference | Abstract | Links | BibTeX)@conference{DBLP:conf/icml/MarlinM09, <p>Recent work has shown that one can learn the structure of Gaussian Graphical Models by imposing an L1 penalty on the precision matrix, and then using efficient convex optimization methods to find the penalized maximum likelihood estimate. This is similar to performing MAP estimation with a prior that prefers sparse graphs. In this paper, we use the stochastic block model as a prior. This prefer graphs that are blockwise sparse, but unlike previous work, it does not require that the blocks or groups be specified a priori. The resulting problem is no longer convex, but we devise an efficient variational Bayes algorithm to solve it. We show that our method has better test set likelihood on two different datasets (motion capture and gene expression) compared to independent L1, and can match the performance of group L1 using manually created groups.</p> |
2007 |
Marlin, Benjamin M; Zemel, Richard S; Roweis, Sam T; Slaney, Malcolm: Collaborative Filtering and the Missing at Random Assumption. UAI, 2007. (Type: Conference | Abstract | Links | BibTeX)@conference{DBLP:conf/uai/MarlinZRS07, <p>Rating prediction is an important application, and a popular research topic in collaborative filtering. However, both the validity of learning algorithms, and the validity of standard testing procedures rest on the assumption that missing ratings are missing at random (MAR). In this paper we present the results of a user study in which we collect a random sample of ratings from current users of an online radio service. An analysis of the rating data collected in the study shows that the sample of random ratings has markedly different properties than ratings of user-selected songs. When asked to report on their own rating behaviour, a large number of users indicate they believe their opinion of a song does affect whether they choose to rate that song, a violation of the MAR condition. Finally, we present experimental results showing that incorporating an explicit model of the missing data mechanism can lead to significant improvements in prediction performance on the random sample of ratings. </p> |
2004 |
Marlin, Benjamin; Toussaint, Godfried: Constructing convex 3-polytopes from two triangulations of a polygon. In: Computational Geometry, vol. 28, no. 1, pp. 41 - 47, 2004, ISSN: 0925-7721. (Type: Journal Article | Abstract | Links | BibTeX)@article{Marlin200441, <p>Guibas conjectured that given a convex polygon P in the xy-plane along with two triangulations of it, T1 and T2 that share no diagonals, it is always possible to assign height values to the vertices of P such that P∪T1∪T2 becomes a convex 3-polytope. Dekster found a counter example but left open the questions of deciding if a given configuration corresponds to a convex 3-polytope, and constructing such realizations when they exist. This paper presents a characterization of realizable configurations for Guibastextquoteright conjecture based on work from the area of polytope convexity testing. Our approach to the decision and construction problems is a reduction to a linear-inequality feasibility problem. The approach is also related to methods used for deciding if an arbitrary triangulation of a point set is a regular triangulation. We show two reductions, one based directly on a global convexity condition resulting in number of inequalities that is quadratic in the number of vertices of P, and one based on an equivalent local convexity condition resulting in a linear number of inequalities.</p> |
Marlin, Benjamin M; Zemel, Richard S: The multiple multiplicative factor model for collaborative filtering. ICML, 2004. (Type: Conference | Abstract | Links | BibTeX)@conference{DBLP:conf/icml/MarlinZ04, <p>We describe a class of causal, discrete latent variable models called Multiple Multiplicative Factor models (MMFs). A data vector is represented in the latent space as a vector of factors that have discrete, non-negative expression levels. Each factor proposes a distribution over the data vector. The distinguishing feature of MMFs is that they combine the factorstextquoteright proposed distributions multiplicatively, taking into account factor expression levels. The product formulation of MMFs allow factors to specialize to a subset of the items, while the causal generative semantics mean MMFs can readily accommodate missing data. This makes MMFs distinct from both directed models with mixture semantics and undirected product models. In this paper we present empirical results from the collaborative filtering domain showing that a binary/multinomial MMF model matches the performance of the best existing models while learning an interesting latent space description of the users.</p> |
2003 |
Boutilier, Craig; Zemel, Richard S; Marlin, Benjamin M: Active Collaborative Filtering. UAI, 2003. (Type: Conference | Abstract | Links | BibTeX)@conference{DBLP:conf/uai/BoutilierZM03, <p>Collaborative filtering (CF) allows the preferences of multiple users to be pooled to make recommendations regarding unseen products. We consider in this paper the problem of online and interactive CF: given the current ratings associated with a user, what queries (new ratings) would most improve the quality of the recommendations made? We cast this terms of expected value of information (EVOI); but the online computational cost of computing optimal queries is prohibitive. We show how offline prototyping and computation of bounds on EVOI can be used to dramatically reduce the required online computation. The framework we develop is general, but we focus on derivations and empirical study in the specific case of the multiple-cause vector quantization model.</p> |
Marlin, Benjamin M: Modeling User Rating Profiles For Collaborative Filtering. NIPS, 2003. (Type: Conference | Abstract | Links | BibTeX)@conference{DBLP:conf/nips/Marlin03, <p>In this paper we present a generative latent variable model for rating-based collaborative filtering called the User Rating Profile model (URP). The generative process which underlies URP is designed to produce complete user rating profiles, an assignment of one rating to each item for each user. Our model represents each user as a mixture of user attitudes, and the mixing proportions are distributed according to a Dirichlet random variable. The rating for each item is generated by selecting a user attitude for the item, and then selecting a rating according to the preference pattern associated with that attitude. URP is related to several models including a multinomial mixture model, the aspect model, and LDA, but has clear advantages over each.</p> |
2002 |
Marlin, Benjamin M; Toussaint, Godfried T: Constructing convex 3-polytopes from two triangulations of a polygon. CCCG, 2002. (Type: Conference | Abstract | Links | BibTeX)@conference{DBLP:conf/cccg/MarlinT02, <p>Guibas conjectured that given a convex polygon P in the xy-plane along with two triangulations of it, T1 and T2 that share no diagonals, it is always possible to assign height values to the vertices of P such that P U T1 U T 2 becomes a convex 3-polytope. Dekster found a counter example but left open the questions of deciding if a given configuration corresponds to a convex 3-polytope, and constructing such realizations when they exist. This paper gives a proof that a relaxed version of Guibastextquoteright conjecture always holds true. The question of deciding the realizability of Guibastextquoteright conjecture is characterized in terms of a linear programming problem. This leads to an algorithm for deciding and constructing such realizations that incorporates a linear programming step with O(n) inequality constraints and n variables.</p> |