Publications
2017 |
Soha, Rostaminia; Addison, Mayberry; Deepak, Ganesan; Benjamin, Marlin; Jeremy, Gummeson iLid: Low-power Sensing of Fatigue and Drowsiness Measures on a Computational Eyeglass Journal Article In: Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, vol. 1, no. 2, pp. 23, 2017. @article{soha2017ilid, <p>The ability to monitor eye closures and blink patterns has long been known to enable accurate assessment of fatigue and drowsiness in individuals. Many measures of the eye are known to be correlated with fatigue including coarse-grained measures like the rate of blinks as well as fine-grained measures like the duration of blinks and the extent of eye closures. Despite a plethora of research validating these measures, we lack wearable devices that can continually and reliably monitor them in the natural environment. In this work, we present a low-power system, iLid, that can continually sense fine-grained measures such as blink duration and Percentage of Eye Closures (PERCLOS) at high frame rates of 100fps. We present a complete solution including design of the sensing, signal processing, and machine learning pipeline; implementation on a prototype computational eyeglass platform; and extensive evaluation under many conditions including illumination changes, eyeglass shifts, and mobility. Our results are very encouraging, showing that we can detect blinks, blink duration, eyelid location, and fatigue-related metrics such as PERCLOS with less than a few percent error.</p> |
Adams, Roy J; Marlin, Benjamin M Learning Time Series Detection Models from Temporally Imprecise Labels Conference The 20th International Conference on Artificial Intelligence and Statistics, 2017, (<p>n/a</p>). @conference{288, <p>In this paper, we consider a new low-quality label learning problem: learning time series detection models from temporally imprecise labels. In this problem, the data consist of a set of input time series, and supervision is provided by a sequence of noisy time stamps corresponding to the occurrence of positive class events. Such temporally imprecise labels commonly occur in areas like mobile health research where human annotators are tasked with labeling the occurrence of very short duration events. We propose a general learning framework for this problem that can accommodate different base classifiers and noise models. We present results on real mobile health data showing that the proposed framework significantly outperforms a number of alternatives including assuming that the label time stamps are noise-free, transforming the problem into the multiple instance learning framework, and learning on labels that were manually re-aligned. </p> |
Dadkhahi, Hamid; Marlin, Benjamin Learning Tree-Structured Detection Cascades for Heterogeneous Networks of Embedded Devices Proceedings 2017, (<p>To appear.</p>). @proceedings{291, <p>In this paper, we present a new approach to learning cascaded classifiers for use in computing environments that involve networks of heterogeneous and resource-constrained, low-power embedded compute and sensing nodes. We present a generalization of the classical linear detection cascade to the case of tree-structured cascades where different branches of the tree execute on different physical compute nodes in the network. Different nodes have access to different features, as well as access to potentially different computation and energy resources. We concentrate on the problem of jointly learning the parameters for all of the classifiers in the cascade given a fixed cascade architecture and a known set of costs required to carry out the computation at each node. To accomplish the objective of joint learning of all detectors, we propose a novel approach to combining classifier outputs during training that better matches the hard cascade setting in which the learned system will be deployed. This work is motivated by research in the area of mobile health where energy efficient real time detectors integrating information from multiple wireless on-body sensors and a smart phone are needed for real-time monitoring and the delivery of just-in-time adaptive interventions. We evaluate our framework on mobile sensor-based human activity recognition and mobile health detector learning problems.</p> |
Dadkhahi, Hamid; Duarte, Marco F; Marlin, Benjamin M Out-of-Sample Extension for Dimensionality Reduction of Noisy Time Series Journal Article In: IEEE Transactions on Image Processing, vol. 26, no. 11, pp. 5435–5446, 2017. @article{dadkhahi2017out, <p>This paper proposes an out-of-sample extension framework for a global manifold learning algorithm (Isomap) that uses temporal information in out-of-sample points in order to make the embedding more robust to noise and artifacts. Given a set of noise-free training data and its embedding, the proposed framework extends the embedding for a noisy time series. This is achieved by adding a spatio-temporal compactness term to the optimization objective of the embedding. To the best of our knowledge, this is the first method for out-of-sample extension of manifold embeddings that leverages timing information available for the extension set. Experimental results demonstrate that our out-of-sample extension algorithm renders a more robust and accurate embedding of sequentially ordered image data in the presence of various noise and artifacts when compared with other timing-aware embeddings. Additionally, we show that an out-of-sample extension framework based on the proposed algorithm outperforms the state of the art in eye-gaze estimation.</p> |
2016 |
Bernstein, Garrett; Sheldon, Daniel R Consistently Estimating Markov Chains with Noisy Aggregate Data. Conference AISTATS, Cadiz, Spain, 2016. @conference{248, |
Jacek, Nicholas; Chiu, Meng-Chieh; Marlin, Benjamin; Moss, Eliot J B Assessing the Limits of Program-Specific Garbage Collection Performance Conference Programming Language Design and Implementation, 2016, (<p>Distinguished Paper Award</p>). @conference{256, <p>We consider the ultimate limits of program-specific garbage collector performance for real programs. We first characterize the GC schedule optimization problem using Markov Decision Processes (MDPs). Based on this characterization, we develop a method of determining, for a given program run and heap size, an optimal schedule of collections for a non-generational collector. We further explore the limits of performance of a generational collector, where it is not feasible to search the space of schedules to prove optimality. Still, we show significant improvements with Least Squares Policy Iteration, a reinforcement learning technique for solving MDPs. We demonstrate that there is considerable promise to reduce garbage collection costs by developing program-specific collection policies.</p> |
Sadasivam, Rajani Shankar; Cutrona, Sarah L; Kinney, Rebecca L; Marlin, Benjamin M; Mazor, Kathleen M; Lemon, Stephenie C; Houston, Thomas K Collective-Intelligence Recommender Systems: Advancing Computer Tailoring for Health Behavior Change Into the 21st Century Journal Article In: Journal of Medical Internet Research, vol. 18, 2016, (<p>n/a</p>). @article{254, <p>What is the next frontier for computer-tailored health communication (CTHC) research? In current CTHC systems, study designers who have expertise in behavioral theory and mapping theory into CTHC systems select the variables and develop the rules that specify how the content should be tailored, based on their knowledge of the targeted population, the literature, and health behavior theories. In collective-intelligence recommender systems (hereafter recommender systems) used by Web 2.0 companies (eg, Netflix and Amazon), machine learning algorithms combine user profiles and continuous feedback ratings of content (from themselves and other users) to empirically tailor content. Augmenting current theory-based CTHC with empirical recommender systems could be evaluated as the next frontier for CTHC.</p> |
Natarajan, Annamalai; Xu, Kevin S; Eriksson, Brian Detecting Divisions of the Autonomic Nervous System Using Wearables Proceedings Florida, USA, 2016. @proceedings{258, |
Natarajan, Annamalai; Angarita, Gustavo; Gaiser, Edward; Malison, Robert; Ganesan, Deepak; Marlin, Benjmain M Domain Adaptation Methods for Improving Lab-to-field Generalization of Cocaine Detection using Wearable ECG Proceedings Heidelberg, Germany, 2016. @proceedings{282, |
Adams, Roy; Saleheen, Nazir; Thomaz, Edison; Parate, Abhinav; Kumar, Santosh; Marlin, Benjamin Hierarchical Span-Based Conditional Random Fields for Labeling and Segmenting Events in Wearable Sensor Data Streams Conference International Conference on Machine Learning, 2016, (<p>n/a</p>). @conference{255, <p>The field of mobile health (mHealth) has the potential to yield new insights into health and behavior through the analysis of continuously recorded data from wearable health and activity sensors. In this paper, we present a hierarchical span-based conditional random field model for the key problem of jointly detecting discrete events in such sensor data streams and segmenting these events into high-level activity sessions. Our model includes higher-order cardinality factors and inter-event duration factors to capture domain-specific structure in the label space. We show that our model supports exact MAP inference in quadratic time via dynamic programming, which we leverage to perform learning in the structured support vector machine framework. We apply the model to the problems of smoking and eating detection using four real data sets. Our results show statistically significant improvements in segmentation performance at the p=0.005 level relative to a hierarchical pairwise CRF.</p> |
Dadkhahi, Hamid; Saleheen, Nazir; Kumar, Santosh; Marlin, Benjamin Learning Shallow Detection Cascades for Wearable Sensor-Based Mobile Health Applications Conference ICML On Device Intelligence Workshop, 2016, (<p>n/a</p>). @conference{257, <p>The field of mobile health aims to leverage recent advances in wearable on-body sensing technology and smart phone computing capabilities to develop systems that can monitor health states and deliver just-in-time adaptive interventions. However, existing work has largely focused on analyzing collected data in the off-line setting. In this paper, we propose a novel approach to learning shallow detection cascades developed explicitly for use in a real-time wearable-phone or wearable-phone-cloud systems. We apply our approach to the problem of cigarette smoking detection from a combination of wrist-worn actigraphy data and respiration chest band data using two and three stage cascades.</p> |
Nguyen, Thai; Adams, Roy J; Natarajan, Annamalai; Marlin, Benjamin M Parsing Wireless Electrocardiogram Signals with the CRF-CFG Model Proceedings 2016, (<p>n/a</p>). @proceedings{259, <p>Recent advances in wearable sensor technology have made it possible to simultaneously collect multiple streams of physiological and context data from individuals as they go about their daily activities in natural environments. However, extracting reliable higher-level inferences from these raw data streams remains a key data analysis challenge. In this paper, we focus on the specific case of the analysis of data from wireless electrocardiogram (ECG) sensors. We present a new robust probabilistic approach to ECG morphology extraction using conditional random field context free grammar models, which have traditionally been applied to parsing problems in natural language processing. We introduce a robust context free grammar for parsing noisy ECG data, and show significantly improved performance on the ECG morphological labeling task.</p> |
Winner, Kevin; Sheldon, Daniel Probabilistic Inference with Generating Functions for Poisson Latent Variable Models Proceedings Barcelona, Spain, 2016. @proceedings{284, |
2015 |
Adams, Roy; Thomaz, Edison; Marlin, Benjamin M Hierarchical Nested CRFs for Segmentation and Labeling of Physiological Time Series Conference NIPS Workshop: Machine Learning for Healthcare, 2015. @conference{247, |
Schein, Aaron; Paisley, John; Blei, David M; Wallach, Hanna Bayesian poisson tensor factorization for inferring multilateral relations from sparse dyadic event counts Conference Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM ACM, 2015, (<p>n/a</p>). @conference{schein2015bayesian, <p>n/a</p> |
Kumar, S; others, Center of excellence for mobile sensor Data-to-Knowledge (MD2K) Journal Article In: Journal of the American Medical Informatics Association, vol. 22, pp. 1137–1142, 2015, (<p>n/a</p>). @article{236, <p>Mobile sensor data-to-knowledge (MD2K) was chosen as one of 11 Big Data Centers of Excellence by the National Institutes of Health, as part of its Big Data-to-Knowledge initiative. MD2K is developing innovative tools to streamline the collection, integration, management, visualization, analysis, and interpretation of health data generated by mobile and wearable sensors. The goal of the big data solutions being developed by MD2K is to reliably quantify physical, biological, behavioral, social, and environmental factors that contribute to health and disease risk. The research conducted by MD2K is targeted at improving health through early detection of adverse health events and by facilitating prevention. MD2K will make its tools, software, and training materials widely available and will also organize workshops and seminars to encourage their use by researchers and clinicians.</p> |
Mayberry, Addison; Hu, Pan; Tun, Yamin; Smith-Freedman, Duncan; Ganesan, Deepak; Salthouse, Christopher; Marlin, Benjamin M CIDER: Enabling Robustness-Power Tradeoffs on a Computational Eyeglass Conference Proceedings of the 21st Annual International Conference on Mobile Computing and Networking, ACM ACM, 2015, (<p>n/a</p>). @conference{mayberry2015cider, <p>n/a</p> |
Li, Steven Cheng-Xian; Marlin, Benjamin Classification of Sparse and Irregularly Sampled Time Series with Mixtures of Expected Gaussian Kernels and Random Features Journal Article In: Proceedings of the 31st Conference on Uncertainty in Artficial Intelligence(UAI-15), 2015, (<p>n/a</p>). @article{liclassification, <p>n/a</p> |
Winner, Kevin; Bernstein, Garrett; Sheldon, Dan Inference in a Partially Observed Queuing Model with Applications in Ecology Conference Proceedings of the 32nd International Conference on Machine Learning (ICML-15), 2015, (<p>n/a</p>). @conference{winner2015inference, <p>n/a</p> |
Iyengar, Srinivasan; Kalra, Sandeep; Ghosh, Anushree; Irwin, David; Shenoy, Prashant; Marlin, Benjamin iProgram: Inferring Smart Schedules for Dumb Thermostats Conference Proceedings of the 2Nd ACM International Conference on Embedded Systems for Energy-Efficient Built Environments, BuildSys textquoteright15 ACM ACM, New York, NY, USA, 2015, ISBN: 978-1-4503-3981-0, (<p>n/a</p>). @conference{Iyengar:2015:IIS:2821650.2821653, <p>Heating, ventilation, and air conditioning (HVAC) accounts for over 50% of a typical hometextquoterights energy usage. A thermostat generally controls HVAC usage in a home to ensure user comfort. In this paper, we focus on making existing "dumb" programmable thermostats smart by applying energy analytics on smart meter data to infer home occupancy patterns and compute an optimized thermostat schedule. Utilities with smart meter deployments are capable of immediately applying our approach, called iProgram, to homes across their customer base. iProgram addresses new challenges in inferring home occupancy from smart meter data where i) training data is not available and ii) the thermostat schedule may be misaligned with occupancy, frequently resulting in high power usage during unoccupied periods. iProgram translates occupancy patterns inferred from opaque smart meter data into a custom schedule for existing types of programmable thermostats, e.g., 1-day, 7-day, etc. We implement iProgram as a web service and show that it reduces the mismatch time between the occupancy pattern and the thermostat schedule by a median value of 44.28 minutes (out of 100 homes) when compared to a default 8am-6pm weekday schedule, with a median deviation of 30.76 minutes off the optimal schedule. Further, iProgram yields a daily energy saving of 0.42kWh on average across the 100 homes. Utilities may use iProgram to recommend thermostat schedules to customers and provide them estimates of potential energy savings in their energy bills.</p> |
Sun, Tao; Sheldon, Dan; Kumar, Akshat Message Passing for Collective Graphical Models Conference Proceedings of the 32nd International Conference on Machine Learning (ICML-15), 2015, (<p>n/a</p>). @conference{sun2015message, <p>n/a</p> |
Saleheen, Nazir; Sarker, Hillol; Hossain, Syed Monowar; Ali, Amin Ahsan; Chatterjee, Soujanya; Marlin, Benjamin; Kumar, Santosh; al textquoteright, Mustafa; Ertin, Emre puffMarker: a multi-sensor approach for pinpointing the timing of first lapse in smoking cessation Conference Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing, ACM ACM, 2015, (<p>n/a</p>). @conference{saleheen2015puffmarker, <p>n/a</p> |
Li, Steven Cheng-Xian; Marlin, Benjamin M Collaborative Multi-Output Gaussian Processes for Collections of Sparse Multivariate Time Series, Conference NIPS Time Series Workshop, 2015, (<p>n/a</p>). @conference{238, <p>Collaborative Multi-Output Gaussian Processes (COGPs) are a flexible tool for modeling multivariate time series. They induce correlation across outputs through the use of shared latent processes. While past work has focused on the computational challenges that result from a single multivariate time series with many observed values, this paper explores the problem of fitting the COGP model to collections of many sparse and irregularly sampled multivariate time series. This work is motivated by applications to modeling physiological data (heart rate, blood pressure, etc.) in Electronic Health Records (EHRs).</p> |
Adams, Roy J; Thomaz, Edison; Marlin, Benjamin M Hierarchical Nested CRFs for Segmentation and Labeling of Physiological Time Series Conference NIPS Workshop on Machine Learning in Healthcare, 2015, (<p>n/a</p>). @conference{239, <p>In this paper, we address the problem of nested hierarchical segmentation<br /> and labeling of time series data. We present a hierarchical<br /> span-based conditional random field framework for this problem that<br /> leverages higher-order factors to enforce the nesting constraints. The framework can<br /> incorporate a variety of additional factors including higher order cardinality<br /> factors. This research is motivated by hierarchical activity recognition problems<br /> in the field of mobile Health (mHealth). We show that the specific model of interest in the mHealth setting supports exact MAP inference in quadratic time. Learning is accomplished in the structured support vector machine framework. We show positive results on real and synthetic data sets.</p> |
Iyengar, Srinivasan; Kalra, Sandeep; Ghosh, Anushree; Irwin, David; Shenoy, Prashant; Marlin, Benjamin iProgram: Inferring Smart Schedules for Dumb Thermostats Conference 10th Annual Women in Machine Learning Workshop, 2015, (<p>n/a</p>). @conference{237, <p>Heating, ventilation, and air conditioning (HVAC) accounts for over 50% of a typical hometextquoterights energy usage. A thermostat generally controls HVAC usage in a home to ensure user comfort. In this paper, we focus on making existing "dumb" programmable thermostats smart by applying energy analytics on smart meter data to infer home occupancy patterns and compute an optimized thermostat schedule. Utilities with smart meter deployments are capable of immediately applying our approach, called iProgram, to homes across their customer base. iProgram addresses new challenges in inferring home occupancy from smart meter data where i) training data is not available and ii) the thermostat schedule may be misaligned with occupancy, frequently resulting in high power usage during unoccupied periods. iProgram translates occupancy patterns inferred from opaque smart meter data into a custom schedule for existing types of programmable thermostats, e.g., 1-day, 7-day, etc. We implement iProgram as a web service and show that it reduces the mismatch time between the occupancy pattern and the thermostat schedule by a median value of 44.28 minutes (out of 100 homes) when compared to a default 8am-6pm weekday schedule, with a median deviation of 30.76 minutes off the optimal schedule. Further, iProgram yields a daily energy saving of 0.42kWh on average across the 100 homes. Utilities may use iProgram to recommend thermostat schedules to customers and provide them estimates of potential energy savings in their energy bills.</p> |
Huang, Haibin; Kalogerakis, Evangelos; Marlin, Benjamin Analysis and synthesis of 3D shape families via deep-learned generative models of surfaces Conference Symposium on Geometry Processing, 2015, (<p>n/a</p>). @conference{229, <p>We present a method for joint analysis and synthesis of geometrically diverse 3D shape families. Our method first learns part-based templates such that an optimal set of fuzzy point and part correspondences is computed between the shapes of an input collection based on a probabilistic deformation model. In contrast to previous template-based approaches, the geometry and deformation parameters of our part-based templates are learned from scratch. Based on the estimated shape correspondence, our method also learns a probabilistic generative model that hierarchically captures statistical relationships of corresponding surface point positions and parts as well as their existence in the input shapes. A deep learning procedure is used to capture these hierarchical relationships. The resulting generative model is used to produce control point arrangements that drive shape synthesis by combining and deforming parts from the input collection. The generative model also yields compact shape descriptors that are used to perform fine-grained classification. Finally, it can be also coupled with the probabilistic deformation model to further improve shape correspondence. We provide qualitative and quantitative evaluations of our method for shape correspondence, segmentation, fine-grained classification and synthesis. Our experiments demonstrate superior correspondence and segmentation results than previous state-of-the-art approaches.</p> |
2014 |
Natarajan, Annamalai; Gaiser, Edward; Angarita, Gustavo; Malison, Robert; Ganesan, Deepak; Marlin, Benjamin Conditional Random Fields for Morphological Analysis of Wireless ECG Signals Conference 5th Annual conference on Bioinformatics, Computational Biology and Health Informatics, Newport Beach, CA, 2014. @conference{198, <p>Thanks to advances in mobile sensing technologies, it has recently become practical to deploy wireless electrocardiograph sensors for continuous recording of ECG signals. This capability has diverse applications in the study of human health and behavior, but to realize its full potential, new computational tools are required to effectively deal with the uncertainty that results from the noisy and highly non-stationary signals collected using these devices. In this work, we present a novel approach to the problem of extracting the morphological structure of ECG signals based on the use of dynamically structured conditional random field (CRF) models. We apply this framework to the problem of extracting morphological structure from wireless ECG sensor data collected in a lab-based study of habituated cocaine users. Our results show that the proposed CRF-based approach significantly out-performs independent prediction models using the same features, as well as a widely cited open source toolkit.</p> |
Mayberry, Addison; Hu, Pan; Marlin, Benjamin; Ganesan, Deepak; Salthouse, Christopher iShadow: Design of a Wearable, Real-Time Mobile Gaze Tracker Conference 12th International Conference on Mobile Systems, Applications, and Services, 2014. @conference{133, <p>Continuous, real-time tracking of eye gaze is valuable in a variety of scenarios including hands-free interaction with the physical world, detection of unsafe behaviors, leveraging visual context for advertising, life logging, and others. While eye tracking is commonly used in clinical trials and user studies, it has not bridged the gap to everyday consumer use. The challenge is that a real-time eye tracker is a power-hungry and computation-intensive device which requires continuous sensing of the eye using an imager running at many tens of frames per second, and continuous processing of the image stream using sophisticated gaze estimation algorithms. Our key contribution is the design of an eye tracker that dramatically reduces the sensing and computation needs for eye tracking, thereby achieving orders of magnitude reductions in power consumption and form-factor. The key idea is that eye images are extremely redundant, therefore we can estimate gaze by using a small subset of carefully chosen pixels per frame. We instantiate this idea in a prototype hardware platform equipped with a low-power image sensor that provides random access to pixel values, a low-power ARM Cortex M3 microcontroller, and a bluetooth radio to communicate with a mobile phone. The sparse pixel-based gaze estimation algorithm is a multi-layer neural network learned using a state-of-the-art sparsity-inducing regularization function that minimizes the gaze prediction error while simultaneously minimizing the number of pixels used. Our results show that we can operate at roughly 70mW of power, while continuously estimating eye gaze at the rate of 30 Hz with errors of roughly 3 degrees.</p> |
Adams, Roy J; Sadasivam, Rajani S; Balakrishnan, Kavitha; Kinney, Rebecca L; Houston, Thomas K; Marlin, Benjamin M PERSPeCT: Collaborative Filtering for Tailored Health Communications Conference Proceedings of the 8th ACM Conference on Recommender Systems, RecSys textquoteright14 ACM ACM, 2014, ISBN: 978-1-4503-2668-1, (<p>n/a</p>). @conference{Adams-RecSys2014, <p>The goal of computer tailored health communications (CTHC) is to elicit healthy behavior changes by sending motivational messages personalized to individual patients. One prominent weakness of many existing CTHC systems is that they are based on expert-written rules and thus have no ability to learn from their users over time. One solution to this problem is to develop CTHC systems based on the principles of collaborative filtering, but this approach has not been widely studied. In this paper, we present a case study evaluating nine rating prediction methods for use in the Patient Experience Recommender System for Persuasive Communication Tailoring, a system developed for use in a clinical trial of CTHC-based smoking cessation support interventions.</p> |
Learned-Miller, Erik; Marlin, Benjamin M; Kae, Andrew The Shape-Time Random Field for Semantic Video Labeling Proceedings 2014, (<p>n/a</p>). @proceedings{134, <p>We propose a novel discriminative model for semantic labeling in videos by incorporating a prior to model both the shape and temporal dependencies of an object in video. A typical approach for this task is the conditional random field (CRF), which can model local interactions among adjacent regions in a video frame. Recent work [16, 14] has shown how to incorporate a shape prior into a CRF for improving labeling performance, but it may be difficult to model temporal dependencies present in video by using this prior. The conditional restricted Boltzmann machine (CRBM) can model both shape and temporal dependencies, and has been used to learn walking styles from motion- capture data. In this work, we incorporate a CRBM prior into a CRF framework and present a new state-of-the-art model for the task of semantic labeling in videos. In particular, we explore the task of labeling parts of complex face scenes from videos in the YouTube Faces Database (YFDB). Our combined model outperforms competitive baselines both qualitatively and quantitatively.</p> |
2013 |
Sheldon, Daniel; Sun, Tao; Kumar, Akshat; Dietterich, Thomas G Approximate Inference in Collective Graphical Models Conference In Proceedings of the 30th International Conference on Machine Learning (ICML), 2013., 2013. @conference{249, |
Natarajan, Annamalai; Parate, Abhinav; Gaiser, Edward; Angarita, Gustavo; Malison, Robert; Marlin, Benjamin; Ganesan, Deepak Detecting Cocaine Use with Wearable Electrocardiogram Sensors Proceedings Zurich, Switzerland, 2013. @proceedings{88b, |
Marlin, Benjamin M; Adams, Roy J; Sadasivam, Rajani; Houston, Thomas K Towards Collaborative Filtering Recommender Systems for Tailored Health Communications Proceedings Washington D.C., 2013. @proceedings{101, |
2012 |
Sheldon, D; Dietterich, T G Collective Graphical Models Conference Advances in Neural Information Processing Systems (NIPS 2011), 2012, (<p>n/a</p>). @conference{sheldon2011collective, <p>There are many settings in which we wish to fit a model of the behavior of individuals but where our data consist only of aggregate information (counts or low-dimensional contingency tables). This paper introduces Collective Graphical Models–-a framework for modeling and probabilistic inference that operates directly on the sufficient statistics of the individual model. We derive a highly-efficient Gibbs sampling algorithm for sampling from the posterior distribution of the sufficient statistics conditioned on noisy aggregate observations, prove its correctness, and demonstrate its effectiveness experimentally.</p> |
Hochachka, Wesley M; Fink, Daniel; Hutchinson, Rebecca A; Sheldon, Daniel; Wong, Weng-Keen; Kelling, Steve Data Intensive Science Applied to Broad-Scale Citizen Science Booklet 2012, (<p>n/a</p>). @booklet{hochachka2011data, <p>Identifying ecological patterns across broad spatial and temporal extents requires novel approaches and methods for acquiring, integrating and modeling massive quantities of diverse data. For example, a growing number of research projects engage continent-wide networks of volunteers (textquoteleftcitizen-scientiststextquoteright) to collect species occurrence data. Although these data are information rich, they present numerous challenges in project design, implementation and analysis, which include: developing data collection tools that maximize data quantity while maintaining high standards of data quality, and applying new analytical and visualization techniques that can accurately reveal patterns in these data. Here, we describe how advances in data-intensive science provide accurate estimates in species distributions at continental scales by identifying complex environmental associations.</p> |
Xue, S; Fern, A; Sheldon, D Scheduling Conservation Designs via Network Cascade Optimization Conference Twenty-Sixth AAAI Conference on Artificial Intelligence, 2012, (<p>n/a</p>). @conference{xue2012scheduling, <p>We introduce the problem of scheduling land purchases to conserve an endangered species in a way that achieves maxi- mum population spread but delays purchases as long as possible, so that conservation planners retain maximum flexibility and use available budgets in the most efficient way. We develop the problem formally as a stochastic optimization problem over a network cascade model describing the population spread, and present a solution approach that reduces the stochastic problem to a novel variant of a Steiner tree problem. We give a primal-dual algorithm for the problem that computes both a feasible solution and a bound on the quality of an optimal solution. Our experiments, using actual conservation data and a standard diffusion model, show that the approach produces near optimal results and is much more scalable than more generic off-the-shelf optimizers.</p> |
Krafft, Peter; Moore, Juston; Wallach, Hanna; Desmarais, Bruce Topic-Partitioned Multinetwork Embeddings Conference Advances in Neural Information Processing Systems Twenty-Five, Lake Tahoe, NV, 2012. @conference{24d, <p>We introduce a joint model of network content and context designed for exploratory analysis of email networks via visualization of topic-specific communication patterns. Our model is an admixture model for text and network attributes that uses multinomial distributions over words as admixture components for explaining email text and latent Euclidean positions of actors as admixture components for explaining email recipients. This model allows us to infer topics of communication, a partition of the overall network into topic-specific subnetworks, and two-dimensional visualizations of those subnetworks. We validate the appropriateness of our model by achieving state-of-the-art performance on a prediction task and semantic coherence comparable to that of latent Dirichlet allocation. We demonstrate the capability of our model for descriptive, explanatory, and exploratory analysis by investigating the inferred topic-specific communication patterns of a new email data set, the New Hanover County email corpus.</p> |
Marlin, Benjamin M; Kale, David C; Khemani, Robinder G; Wetzel, Randall C Unsupervised pattern discovery in electronic health care data using probabilistic clustering models Conference Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium, IHI textquoteright12 ACM ACM, New York, NY, USA, 2012, ISBN: 978-1-4503-0781-9. @conference{Marlin:2012:UPD:2110363.2110408, <p>Bedside clinicians routinely identify temporal patterns in physiologic data in the process of choosing and administering treatments intended to alter the course of critical illness for individual patients. Our primary interest is the study of unsupervised learning techniques for automatically uncovering such patterns from the physiologic time series data contained in electronic health care records. This data is sparse, high-dimensional and often both uncertain and incomplete. In this paper, we develop and study a probabilistic clustering model designed to mitigate the effects of temporal sparsity inherent in electronic health care records data. We evaluate the model qualitatively by visualizing the learned cluster parameters and quantitatively in terms of its ability to predict mortality outcomes associated with patient episodes. Our results indicate that the model can discover distinct, recognizable physiologic patterns with prognostic significance.</p> |