Publications
2023 |
Karine, Karine; Klasnja, Predrag; Murphy, Susan; Marlin, Benjamin: Assessing the Impact of Context Inference Error and Partial Observability on RL Methods for Just-In-Time Adaptive Interventions. In: Conference on Uncertainty in Artificial Intelligence (To Appear), 2023. (Type: Proceedings Article | BibTeX)@inproceedings{karine-uai2023, |
Chow, Sy-Miin; Nahum-Shani, Inbal; Baker, Justin T; Spruijt-Metz, Donna; Allen, Nicholas B; Auerbach, Randy P; Dunton, Genevieve F; Friedman, Naomi P; Intille, Stephen S; Klasnja, Predrag; others,: The ILHBN: challenges, opportunities, and solutions from harmonizing data under heterogeneous study designs, target populations, and measurement protocols. In: Translational Behavioral Medicine, vol. 13, no. 1, pp. 7–16, 2023. (Type: Journal Article | BibTeX)@article{chow2023ilhbn, |
Samplawski, Colin; Fang, Shiwei; Wang, Ziqi; Ganesan, Deepak; Srivastava, Mani; Marlin, Benjamin: Heteroskedastic Geospatial Tracking with Distributed Camera Networks. In: Conference on Uncertainty in Artificial Intelligence (To Appear), pp. 9 pages, 2023. (Type: Proceedings Article | BibTeX)@inproceedings{samplawski-uai2023, |
Fang, Shiwei; Sarker, Ankur; Wang, Ziqi; Srivastava, Mani; Marlin, Benjamin; Ganesan, Deepak: Design and Deployment of a Multi-Modal Multi-Node Sensor Data Collection Platform. In: 20th ACM Conference on Embedded Networked Sensor Systems Workshop on Data: Acquisition to Analysis, pp. 1041–1046, 2023. (Type: Proceedings Article | BibTeX)@inproceedings{fang-enss2022, |
Vadera, Meet P; Samplawski, Colin; Marlin, Benjamin M: Uncertainty Quantification Using Query-Based Object Detectors. In: Computer Vision–ECCV 2022 Workshops, Part VIII, pp. 78–93, 2023. (Type: Proceedings Article | BibTeX)@inproceedings{vadera2023uncertainty, |
2022 |
Tung, Karine; Torre, Steven De La; Mistiri, Mohamed El; Braganca, Rebecca Braga De; Hekler, Eric; Pavel, Misha; Rivera, Daniel; Klasnja, Pedja; Spruijt-Metz, Donna; Marlin, Benjamin M: BayesLDM: A Domain-specific Modeling Language for Probabilistic Modeling of Longitudinal Data. In: 2022 IEEE/ACM Conference on Connected Health: Applications, Systems and Engineering Technologies (CHASE), pp. 78–90, IEEE 2022. (Type: Proceedings Article | BibTeX)@inproceedings{tung-chase2022, |
Vadera, Meet; Li, Jinyang; Cobb, Adam; Jalaian, Brian; Abdelzaher, Tarek; Marlin, Benjamin: Ursabench: A system for comprehensive benchmarking of bayesian deep neural network models and inference methods. In: Proceedings of Machine Learning and Systems, vol. 4, pp. 217–237, 2022. (Type: Journal Article | BibTeX)@article{vadera2022ursabench, |
2021 |
Vadera, Meet P.; Marlin, Benjamin M.: Challenges and Opportunities in Approximate Bayesian Deep Learning for Intelligent IoT Systems. In: IEEE Third International Conference on Cognitive Machine Intelligence (CogMI), pp. 252-261, 2021. (Type: Proceedings Article | BibTeX)@inproceedings{vadera-cogmi2021, |
2020 |
Huang, Jin; Samplawski, Colin; Ganesan, Deepak; Marlin, Benjamin; Kwon, Heesung: CLIO: Enabling automatic compilation of deep learning pipelines across IoT and Cloud. In: MobiCom, Forthcoming. (Type: Proceedings Article | Abstract | BibTeX)@inproceedings{huang2020, Recent years have seen dramatic advances in low-power neural accelerators that aim to bring deep learning analytics to IoT devices; simultaneously, there have been considerable advances in the design of low-power radios aimed at enabling efficient compute offload from IoT devices to the cloud. Neither is a panacea --- deep learning models are often too large for low-power accelerators and bandwidth needs often too high for low-power radios. While there has been considerable work on deep learning for smartphone-class devices, we lack a good understanding of how to design efficient and low-power deep learning systems for resource-constrained IoT devices. In this paper, we attempt to bridge this gap by designing a continuously tunable method for leveraging both local and remote resources to optimize performance of a deep learning model. CLIO presents a novel approach to split machine learning models between an IoT device and cloud in a progressive manner to adapt to wireless dynamics. We show that this method can be combined with model compression, adaptive model partitioning and privacy preservation to create an integrated system for IoT-cloud partitioning. We implement CLIO on the GAP8 low-power neural accelerator and provide an exhaustive characterization of the operating regimes where each method performs best and show that CLIO can enable graceful degradation of prediction accuracy as resources diminish. |
Marlin, Benjamin M.; Abdelzaher, Tarek; Ciocarlie, Gabriela; Cobb, Adam D.; Dennison, Mark; Jalaian, Brian; Kaplan, Lance; Raber, Tiffany; Raglin, Adrienne; Sharma, Piyush K.; Srivastava, Mani; Trout, Theron; Vadera, Meet P.; Wigness, Maggie: On Uncertainty and Robustness in Large-Scale Intelligent Data Fusion Systems. In: 2020 IEEE Second International Conference on Cognitive Machine Intelligence (CogMI), pp. 82-91, 2020. (Type: Proceedings Article | Links | BibTeX)@inproceedings{marlin-cogmi2020, |
Vadera, Meet; Jalaian, Brian; Marlin, Benjamin: Generalized bayesian posterior expectation distillation for deep neural networks. In: Conference on Uncertainty in Artificial Intelligence, pp. 719–728, PMLR 2020. (Type: Proceedings Article | BibTeX)@inproceedings{vadera2020generalized, |
Samplawski, Colin; Huang, Jin; Ganesan, Deepak; Marlin, Benjamin M: Towards objection detection under IoT resource constraints: Combining partitioning, slicing and compression. In: Proceedings of the 2nd International Workshop on Challenges in Artificial Intelligence and Machine Learning for Internet of Things, pp. 14–20, 2020. (Type: Proceedings Article | BibTeX)@inproceedings{samplawski-aiiot2020, |
2019 |
Shukla, Satya Narayan; Marlin, Benjamin: Interpolation-Prediction Networks for Irregularly Sampled Time Series. Seventh International Conference on Learning Representations, Forthcoming. (Type: Conference | Abstract | BibTeX)@conference{Shukla2019, In this paper, we present a new deep learning architecture for addressing the problem of supervised learning with sparse and irregularly sampled multivariate time series. The architecture is based on the use of a semi-parametric interpolation network followed by the application of a prediction network. The interpolation network allows for information to be shared across multiple dimensions of a multivariate time series during the interpolation stage, while any standard deep learning model can be used for the prediction network. This work is motivated by the analysis of physiological time series data in electronic health records, which are sparse, irregularly sampled, and multivariate. We investigate the performance of this architecture on two datasets for both classification and regression tasks, showing that our approach outperforms a range of baseline and recently proposed models. |
Li, Steven Cheng-Xian; Jiang, Bo; Marlin, Benjamin: Learning from Incomplete Data with Generative Adversarial Networks. Seventh International Conference on Learning Representations, Forthcoming. (Type: Conference | Abstract | BibTeX)@conference{Li2019, Generative adversarial networks (GANs) have been shown to provide an effective way to model complex distributions and have obtained impressive results on various challenging tasks. However, typical GANs require fully-observed data during training. In this paper, we present a modular approach to learning GANs from incomplete observations that can be combined with different generator and discriminator networks and is amenable for use with complex, high-dimensional inputs. The proposed framework learns a complete data generator along with a mask generator that models the missingness. We further demonstrate how to impute missing data by equipping our framework with an adversarially trained imputer. We evaluate the proposed framework using a series of experiments with several types of missing completely at random missing data processes. |
Jacek, Nicholas; Chiu, Meng-Chieh; Marlin, Benjamin M.; Moss, J. Eliot B.: Optimal Choice of When to Garbage Collect. In: ACM Transactions on Programming Languages and Systems, Forthcoming. (Type: Journal Article | Abstract | BibTeX)@article{jacek2019, We consider the ultimate limits of program-specific garbage collector (GC) performance for real programs. We first characterize the GC schedule optimization problem. Based on this characterization, we develop a linear-time dynamic programming solution that, given a program run and heap size, computes an optimal schedule of collections for a non-generational collector. Using an analysis of a heap object graph of the program, we compute a property of heap objects that we call their pre-birth time. This information enables us to extend the non-generational GC schedule problem to the generational GC case in a way that also admits a dynamic programming solution with cost quadratic in the length of the trace (number of objects allocated). This improves our previously reported approximately optimal result. We further extend the two-generation dynamic program to any number of generations, allowing other generalizations as well. Our experimental results for two generations on traces from Java programs of the DaCapo benchmark suite show that there is considerable promise to reduce garbage collection costs for some programs by developing program-specific collection policies. For a given space budget, optimal schedules often obtain modest but useful time savings, and for a given time budget, optimal schedules can obtain considerable space savings. |
2018 |
Adams, Roy; Marlin, Benjamin M.: Learning Time Series Segmentation Models from Temporally Imprecise Labels . 2018. (Type: Conference | Abstract | Links | BibTeX)@conference{Adams2018, This paper considers the problem of learning time series segmentation models when the labeled data is subject to temporal uncertainty or noise. Our approach augments the semi-Markov conditional random field (semi-CRF) model with a probabilistic model of the label observation process. This augmentation allows us to estimate the parameters of the semi-CRF from timestamps corresponding roughly to the occurrence of segment transitions. We show how exact marginal inference can be performed in polynomial time enabling learning based on marginal likelihood maximization. Our experiments on two activity detection problems show that the proposed approach can learn models from temporally imprecise labels, and can successfully refine imprecise segmentations through posterior inference. Finally, we show how inference complexity can be reduced by a factor of 40 using static and model-based pruning of the inference dynamic program. |
Abdelzaher, Tarek; Ayanian, Nora; Basar, Tamer; Diggavi, Suhas; Diesner, Jana; Ganesan, Deepak; Govindan, Ramesh; Jha, Susmit; Lepoint, Tancrede; Marlin, Benjamin; others,: Toward an internet of battlefield things: A resilience perspective. In: Computer, vol. 51, no. 11, pp. 24–36, 2018. (Type: Journal Article | BibTeX)@article{abdelzaher2018toward, |
2017 |
Kumar, Santosh; others,: Center of Excellence for Mobile Sensor Data-to-Knowledge (MD2K). In: IEEE Pervasive Computing, vol. 16, no. 2, pp. 18–22, 2017. (Type: Journal Article | Abstract | Links | BibTeX)@article{kumar2017center, <p>The Center of Excellence for Mobile Sensor Data-to-Knowledge (MD2K) is enabling the collection of high-frequency mobile sensor data for the development and validation of novel multisensory biomarkers and sensor-triggered interventions.</p> |
Soha, Rostaminia; Addison, Mayberry; Deepak, Ganesan; Benjamin, Marlin; Jeremy, Gummeson: iLid: Low-power Sensing of Fatigue and Drowsiness Measures on a Computational Eyeglass. In: Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, vol. 1, no. 2, pp. 23, 2017. (Type: Journal Article | Abstract | BibTeX)@article{soha2017ilid, <p>The ability to monitor eye closures and blink patterns has long been known to enable accurate assessment of fatigue and drowsiness in individuals. Many measures of the eye are known to be correlated with fatigue including coarse-grained measures like the rate of blinks as well as fine-grained measures like the duration of blinks and the extent of eye closures. Despite a plethora of research validating these measures, we lack wearable devices that can continually and reliably monitor them in the natural environment. In this work, we present a low-power system, iLid, that can continually sense fine-grained measures such as blink duration and Percentage of Eye Closures (PERCLOS) at high frame rates of 100fps. We present a complete solution including design of the sensing, signal processing, and machine learning pipeline; implementation on a prototype computational eyeglass platform; and extensive evaluation under many conditions including illumination changes, eyeglass shifts, and mobility. Our results are very encouraging, showing that we can detect blinks, blink duration, eyelid location, and fatigue-related metrics such as PERCLOS with less than a few percent error.</p> |
Adams, Roy J; Marlin, Benjamin M: Learning Time Series Detection Models from Temporally Imprecise Labels. The 20th International Conference on Artificial Intelligence and Statistics, 2017. (Type: Conference | Abstract | Links | BibTeX)@conference{288, <p>In this paper, we consider a new low-quality label learning problem: learning time series detection models from temporally imprecise labels. In this problem, the data consist of a set of input time series, and supervision is provided by a sequence of noisy time stamps corresponding to the occurrence of positive class events. Such temporally imprecise labels commonly occur in areas like mobile health research where human annotators are tasked with labeling the occurrence of very short duration events. We propose a general learning framework for this problem that can accommodate different base classifiers and noise models. We present results on real mobile health data showing that the proposed framework significantly outperforms a number of alternatives including assuming that the label time stamps are noise-free, transforming the problem into the multiple instance learning framework, and learning on labels that were manually re-aligned. </p> |
Dadkhahi, Hamid; Marlin, Benjamin: Learning Tree-Structured Detection Cascades for Heterogeneous Networks of Embedded Devices. 2017. (Type: Proceedings | Abstract | BibTeX)@proceedings{291, <p>In this paper, we present a new approach to learning cascaded classifiers for use in computing environments that involve networks of heterogeneous and resource-constrained, low-power embedded compute and sensing nodes. We present a generalization of the classical linear detection cascade to the case of tree-structured cascades where different branches of the tree execute on different physical compute nodes in the network. Different nodes have access to different features, as well as access to potentially different computation and energy resources. We concentrate on the problem of jointly learning the parameters for all of the classifiers in the cascade given a fixed cascade architecture and a known set of costs required to carry out the computation at each node. To accomplish the objective of joint learning of all detectors, we propose a novel approach to combining classifier outputs during training that better matches the hard cascade setting in which the learned system will be deployed. This work is motivated by research in the area of mobile health where energy efficient real time detectors integrating information from multiple wireless on-body sensors and a smart phone are needed for real-time monitoring and the delivery of just-in-time adaptive interventions. We evaluate our framework on mobile sensor-based human activity recognition and mobile health detector learning problems.</p> |
Dadkhahi, Hamid; Duarte, Marco F; Marlin, Benjamin M: Out-of-Sample Extension for Dimensionality Reduction of Noisy Time Series. In: IEEE Transactions on Image Processing, vol. 26, no. 11, pp. 5435–5446, 2017. (Type: Journal Article | Abstract | BibTeX)@article{dadkhahi2017out, <p>This paper proposes an out-of-sample extension framework for a global manifold learning algorithm (Isomap) that uses temporal information in out-of-sample points in order to make the embedding more robust to noise and artifacts. Given a set of noise-free training data and its embedding, the proposed framework extends the embedding for a noisy time series. This is achieved by adding a spatio-temporal compactness term to the optimization objective of the embedding. To the best of our knowledge, this is the first method for out-of-sample extension of manifold embeddings that leverages timing information available for the extension set. Experimental results demonstrate that our out-of-sample extension algorithm renders a more robust and accurate embedding of sequentially ordered image data in the presence of various noise and artifacts when compared with other timing-aware embeddings. Additionally, we show that an out-of-sample extension framework based on the proposed algorithm outperforms the state of the art in eye-gaze estimation.</p> |
2016 |
Jacek, Nicholas; Chiu, Meng-Chieh; Marlin, Benjamin; Moss, Eliot J B: Assessing the Limits of Program-Specific Garbage Collection Performance. Programming Language Design and Implementation, 2016, (<p>Distinguished Paper Award</p>). (Type: Conference | Abstract | Links | BibTeX)@conference{256, <p>We consider the ultimate limits of program-specific garbage collector performance for real programs. We first characterize the GC schedule optimization problem using Markov Decision Processes (MDPs). Based on this characterization, we develop a method of determining, for a given program run and heap size, an optimal schedule of collections for a non-generational collector. We further explore the limits of performance of a generational collector, where it is not feasible to search the space of schedules to prove optimality. Still, we show significant improvements with Least Squares Policy Iteration, a reinforcement learning technique for solving MDPs. We demonstrate that there is considerable promise to reduce garbage collection costs by developing program-specific collection policies.</p> |
Sadasivam, Rajani Shankar; Cutrona, Sarah L; Kinney, Rebecca L; Marlin, Benjamin M; Mazor, Kathleen M; Lemon, Stephenie C; Houston, Thomas K: Collective-Intelligence Recommender Systems: Advancing Computer Tailoring for Health Behavior Change Into the 21st Century. In: Journal of Medical Internet Research, vol. 18, 2016. (Type: Journal Article | Abstract | Links | BibTeX)@article{254, <p>What is the next frontier for computer-tailored health communication (CTHC) research? In current CTHC systems, study designers who have expertise in behavioral theory and mapping theory into CTHC systems select the variables and develop the rules that specify how the content should be tailored, based on their knowledge of the targeted population, the literature, and health behavior theories. In collective-intelligence recommender systems (hereafter recommender systems) used by Web 2.0 companies (eg, Netflix and Amazon), machine learning algorithms combine user profiles and continuous feedback ratings of content (from themselves and other users) to empirically tailor content. Augmenting current theory-based CTHC with empirical recommender systems could be evaluated as the next frontier for CTHC.</p> |
Natarajan, Annamalai; Angarita, Gustavo; Gaiser, Edward; Malison, Robert; Ganesan, Deepak; Marlin, Benjamin: Domain Adaptation Methods for Improving Lab-to-field Generalization of Cocaine Detection using Wearable ECG. 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing, 2016. (Type: Conference | Abstract | Links | BibTeX)@conference{279, <p>Mobile health research on illicit drug use detection typically involves a two-stage study design where data to learn detectors is first collected in lab-based trials, followed by a deployment to subjects in a free-living environment to assess detector performance. While recent work has demonstrated the feasibility of wearable sensors for illicit drug use detection in the lab setting, several key problems can limit lab-to-field generalization performance. For example, lab-based data collection often has low ecological validity, the ground-truth event labels collected in the lab may not be available at the same level of temporal granularity in the field, and there can be significant variability between subjects. In this paper, we present domain adaptation methods for assessing and mitigating potential sources of performance loss in lab-to-field generalization and apply them to the problem of cocaine use detection from wearable electrocardiogram sensor data.</p> |
Adams, Roy; Saleheen, Nazir; Thomaz, Edison; Parate, Abhinav; Kumar, Santosh; Marlin, Benjamin: Hierarchical Span-Based Conditional Random Fields for Labeling and Segmenting Events in Wearable Sensor Data Streams. International Conference on Machine Learning, 2016. (Type: Conference | Abstract | Links | BibTeX)@conference{255, <p>The field of mobile health (mHealth) has the potential to yield new insights into health and behavior through the analysis of continuously recorded data from wearable health and activity sensors. In this paper, we present a hierarchical span-based conditional random field model for the key problem of jointly detecting discrete events in such sensor data streams and segmenting these events into high-level activity sessions. Our model includes higher-order cardinality factors and inter-event duration factors to capture domain-specific structure in the label space. We show that our model supports exact MAP inference in quadratic time via dynamic programming, which we leverage to perform learning in the structured support vector machine framework. We apply the model to the problems of smoking and eating detection using four real data sets. Our results show statistically significant improvements in segmentation performance at the p=0.005 level relative to a hierarchical pairwise CRF.</p> |
Sadasivam, Rajani Shankar; Borglund, Erin M; Adams, Roy; Marlin, Benjamin M; Houston, Thomas K: Impact of a Collective Intelligence Tailored Messaging System on Smoking Cessation: The Perspect Randomized Experiment. In: Journal of Medical Internet Research, vol. 18, pp. e285:1-13, 2016. (Type: Journal Article | Abstract | Links | BibTeX)@article{289, <p>Background</p> <p>Outside health care, content tailoring is driven algorithmically using machine learning compared to the rule-based approach used in current implementations of computer-tailored health communication (CTHC) systems. A special class of machine learning systems (textquotedblleftrecommender systemstextquotedblright) are used to select messages by combining the collective intelligence of their users (ie, the observed and inferred preferences of users as they interact with the system) and their user profiles. However, this approach has not been adequately tested for CTHC.<br /> Objective</p> <p>Our aim was to compare, in a randomized experiment, a standard, evidence-based, rule-based CTHC (standard CTHC) to a novel machine learning CTHC: Patient Experience Recommender System for Persuasive Communication Tailoring (PERSPeCT). We hypothesized that PERSPeCT will select messages of higher influence than our standard CTHC system. This standard CTHC was proven effective in motivating smoking cessation in a prior randomized trial of 900 smokers (OR 1.70, 95% CI 1.03-2.81).<br /> Methods</p> <p>PERSPeCT is an innovative hybrid machine learning recommender system that selects and sends motivational messages using algorithms that learn from message ratings from 846 previous participants (explicit feedback), and the prior explicit ratings of each individual participant. Current smokers (N=120) aged 18 years or older, English speaking, with Internet access were eligible to participate. These smokers were randomized to receive either PERSPeCT (intervention |
Hiatt, Laura; Adams, Roy; Marlin, Benjamin: An Improved Data Representation for Smoking Detection with Wearable Respiration Sensors. IEEE Wireless Health, 2016, (<p>Late breaking extended abstract.</p>). (Type: Conference | BibTeX)@conference{290, |
Dadkhahi, Hamid; Saleheen, Nazir; Kumar, Santosh; Marlin, Benjamin: Learning Shallow Detection Cascades for Wearable Sensor-Based Mobile Health Applications. ICML On Device Intelligence Workshop, 2016. (Type: Conference | Abstract | Links | BibTeX)@conference{257, <p>The field of mobile health aims to leverage recent advances in wearable on-body sensing technology and smart phone computing capabilities to develop systems that can monitor health states and deliver just-in-time adaptive interventions. However, existing work has largely focused on analyzing collected data in the off-line setting. In this paper, we propose a novel approach to learning shallow detection cascades developed explicitly for use in a real-time wearable-phone or wearable-phone-cloud systems. We apply our approach to the problem of cigarette smoking detection from a combination of wrist-worn actigraphy data and respiration chest band data using two and three stage cascades.</p> |
Nguyen, Thai; Adams, Roy J; Natarajan, Annamalai; Marlin, Benjamin M: Parsing Wireless Electrocardiogram Signals with Context Free Grammar Conditional Random Fields. IEEE Wireless Health, 2016. (Type: Conference | Abstract | Links | BibTeX)@conference{278, <p>Recent advances in wearable sensor technology have made it possible to simultaneously collect multiple streams of physiological and context data from individuals as they go about their daily activities in natural environments. However, extracting reliable higher-level inferences from these raw data streams remains a key data analysis challenge. In this paper, we focus on the specific case of the analysis of data from wireless electrocardiogram (ECG) sensors. We present a new robust probabilistic approach to ECG morphology extraction using conditional random field context free grammar models, which have traditionally been applied to parsing problems in natural language processing. We introduce a robust context free grammar for parsing noisy ECG data, and show significantly improved performance on the ECG morphological labeling task.</p> |
Nguyen, Thai; Adams, Roy J; Natarajan, Annamalai; Marlin, Benjamin M: Parsing Wireless Electrocardiogram Signals with the CRF-CFG Model. 2016. (Type: Proceedings | Abstract | Links | BibTeX)@proceedings{259, <p>Recent advances in wearable sensor technology have made it possible to simultaneously collect multiple streams of physiological and context data from individuals as they go about their daily activities in natural environments. However, extracting reliable higher-level inferences from these raw data streams remains a key data analysis challenge. In this paper, we focus on the specific case of the analysis of data from wireless electrocardiogram (ECG) sensors. We present a new robust probabilistic approach to ECG morphology extraction using conditional random field context free grammar models, which have traditionally been applied to parsing problems in natural language processing. We introduce a robust context free grammar for parsing noisy ECG data, and show significantly improved performance on the ECG morphological labeling task.</p> |
Chiu, Meng-Chieh; Marlin, Benjamin; Moss, Eliot: Real-Time Program-Specific Phase Change Detection for Java Programs. 13th International Conference on Principles and Practices of Programming on the Java Platform: Virtual Machines, Languages, and Tools, 2016. (Type: Conference | Abstract | Links | BibTeX)@conference{277, <p>It is well-known that programs tend to have multiple phases in their execution. Because phases have impact on micro-architectural features such as caches and branch predictors, they are relevant to program performance and energy consumption. They are also relevant to detecting whether a program is executing as expected or is encountering unusual or exceptional conditions, a software engineering and program monitoring concern. We offer here a method for real-time phase change detection in Java programs. After applying a training protocol to a program of interest, our method can detect phase changes at run time for that program with good precision and recall (compared with a textquotedblleftground truthtextquotedblright definition of phases) and with small performance impact (average less than 2%). We also offer improved methodology for evaluating phase change detection mechanisms. In sum, our approach offers the first known implementation of real-time phase detection for Java programs.</p> |
Li, Steven Cheng-Xian; Marlin, Benjamin M: A scalable end-to-end Gaussian process adapter for irregularly sampled time series classification. 2016. (Type: Proceedings | Abstract | BibTeX)@proceedings{283, <p>We present a general framework for classification of sparse and irregularly-sampled time series. The properties of such time series can result in substantial uncertainty about the values of the underlying temporal processes, while making the data difficult to deal with using standard classification methods that assume fixed-dimensional feature spaces. To address these challenges, we propose an uncertainty-aware classification framework based on a special computational layer we refer to as the Gaussian process adapter that can connect irregularly sampled time series data to to any black-box classifier learnable using gradient descent. We show how to scale up the required computations based on combining the structured kernel interpolation framework and the Lanczos approximation method, and how to discriminatively train the Gaussian process adapter in combination with a number of classifiers end-to-end using backpropagation.</p> |
2015 |
Li, Steven Cheng-Xian; Marlin, Benjamin M: Classification of Sparse and Irregularly Sampled Time Series with Mixtures of Expected Gaussian Kernels and Random Features. 2015. (Type: Proceedings | Abstract | Links | BibTeX)@proceedings{228, <p>This paper presents a kernel-based framework for classification of sparse and irregularly sampled time series. The properties of such time series can result in substantial uncertainty about the values of the underlying temporal processes, while making the data difficult to deal with using standard classification methods that assume fixed-dimensional feature spaces. To address these challenges, we propose to first re-represent each time series through the Gaussian process (GP) posterior it induces under a GP regression model. We then define kernels over the space of GP posteriors and apply standard kernel-based classification. Our primary contributions are (i) the development of a kernel between GPs based on the mixture of kernels between their finite marginals, (ii) the development and analysis of extensions of random Fourier features for scaling the proposed kernel to large-scale data, and (iii) an extensive empirical analysis of both the classification performance and scalability of our proposed approach. </p> |
Kumar, S; others,: Center of excellence for mobile sensor Data-to-Knowledge (MD2K). In: Journal of the American Medical Informatics Association, vol. 22, pp. 1137–1142, 2015. (Type: Journal Article | Abstract | Links | BibTeX)@article{236, <p>Mobile sensor data-to-knowledge (MD2K) was chosen as one of 11 Big Data Centers of Excellence by the National Institutes of Health, as part of its Big Data-to-Knowledge initiative. MD2K is developing innovative tools to streamline the collection, integration, management, visualization, analysis, and interpretation of health data generated by mobile and wearable sensors. The goal of the big data solutions being developed by MD2K is to reliably quantify physical, biological, behavioral, social, and environmental factors that contribute to health and disease risk. The research conducted by MD2K is targeted at improving health through early detection of adverse health events and by facilitating prevention. MD2K will make its tools, software, and training materials widely available and will also organize workshops and seminars to encourage their use by researchers and clinicians.</p> |
Iyengar, Srinivasan; Kalra, Sandeep; Ghosh, Anushree; Irwin, David; Shenoy, Prashant; Marlin, Benjamin: iProgram: Inferring Smart Schedules for Dumb Thermostats. Proceedings of the 2Nd ACM International Conference on Embedded Systems for Energy-Efficient Built Environments, BuildSys textquoteright15 ACM ACM, New York, NY, USA, 2015, ISBN: 978-1-4503-3981-0. (Type: Conference | Abstract | Links | BibTeX)@conference{Iyengar:2015:IIS:2821650.2821653, <p>Heating, ventilation, and air conditioning (HVAC) accounts for over 50% of a typical hometextquoterights energy usage. A thermostat generally controls HVAC usage in a home to ensure user comfort. In this paper, we focus on making existing "dumb" programmable thermostats smart by applying energy analytics on smart meter data to infer home occupancy patterns and compute an optimized thermostat schedule. Utilities with smart meter deployments are capable of immediately applying our approach, called iProgram, to homes across their customer base. iProgram addresses new challenges in inferring home occupancy from smart meter data where i) training data is not available and ii) the thermostat schedule may be misaligned with occupancy, frequently resulting in high power usage during unoccupied periods. iProgram translates occupancy patterns inferred from opaque smart meter data into a custom schedule for existing types of programmable thermostats, e.g., 1-day, 7-day, etc. We implement iProgram as a web service and show that it reduces the mismatch time between the occupancy pattern and the thermostat schedule by a median value of 44.28 minutes (out of 100 homes) when compared to a default 8am-6pm weekday schedule, with a median deviation of 30.76 minutes off the optimal schedule. Further, iProgram yields a daily energy saving of 0.42kWh on average across the 100 homes. Utilities may use iProgram to recommend thermostat schedules to customers and provide them estimates of potential energy savings in their energy bills.</p> |
Li, Steven Cheng-Xian; Marlin, Benjamin M: Collaborative Multi-Output Gaussian Processes for Collections of Sparse Multivariate Time Series,. NIPS Time Series Workshop, 2015. (Type: Conference | Abstract | Links | BibTeX)@conference{238, <p>Collaborative Multi-Output Gaussian Processes (COGPs) are a flexible tool for modeling multivariate time series. They induce correlation across outputs through the use of shared latent processes. While past work has focused on the computational challenges that result from a single multivariate time series with many observed values, this paper explores the problem of fitting the COGP model to collections of many sparse and irregularly sampled multivariate time series. This work is motivated by applications to modeling physiological data (heart rate, blood pressure, etc.) in Electronic Health Records (EHRs).</p> |
Adams, Roy J; Thomaz, Edison; Marlin, Benjamin M: Hierarchical Nested CRFs for Segmentation and Labeling of Physiological Time Series. NIPS Workshop on Machine Learning in Healthcare, 2015. (Type: Conference | Abstract | Links | BibTeX)@conference{239, <p>In this paper, we address the problem of nested hierarchical segmentation<br /> and labeling of time series data. We present a hierarchical<br /> span-based conditional random field framework for this problem that<br /> leverages higher-order factors to enforce the nesting constraints. The framework can<br /> incorporate a variety of additional factors including higher order cardinality<br /> factors. This research is motivated by hierarchical activity recognition problems<br /> in the field of mobile Health (mHealth). We show that the specific model of interest in the mHealth setting supports exact MAP inference in quadratic time. Learning is accomplished in the structured support vector machine framework. We show positive results on real and synthetic data sets.</p> |
Iyengar, Srinivasan; Kalra, Sandeep; Ghosh, Anushree; Irwin, David; Shenoy, Prashant; Marlin, Benjamin: iProgram: Inferring Smart Schedules for Dumb Thermostats. 10th Annual Women in Machine Learning Workshop, 2015. (Type: Conference | Abstract | BibTeX)@conference{237, <p>Heating, ventilation, and air conditioning (HVAC) accounts for over 50% of a typical hometextquoterights energy usage. A thermostat generally controls HVAC usage in a home to ensure user comfort. In this paper, we focus on making existing "dumb" programmable thermostats smart by applying energy analytics on smart meter data to infer home occupancy patterns and compute an optimized thermostat schedule. Utilities with smart meter deployments are capable of immediately applying our approach, called iProgram, to homes across their customer base. iProgram addresses new challenges in inferring home occupancy from smart meter data where i) training data is not available and ii) the thermostat schedule may be misaligned with occupancy, frequently resulting in high power usage during unoccupied periods. iProgram translates occupancy patterns inferred from opaque smart meter data into a custom schedule for existing types of programmable thermostats, e.g., 1-day, 7-day, etc. We implement iProgram as a web service and show that it reduces the mismatch time between the occupancy pattern and the thermostat schedule by a median value of 44.28 minutes (out of 100 homes) when compared to a default 8am-6pm weekday schedule, with a median deviation of 30.76 minutes off the optimal schedule. Further, iProgram yields a daily energy saving of 0.42kWh on average across the 100 homes. Utilities may use iProgram to recommend thermostat schedules to customers and provide them estimates of potential energy savings in their energy bills.</p> |
Huang, Haibin; Kalogerakis, Evangelos; Marlin, Benjamin: Analysis and synthesis of 3D shape families via deep-learned generative models of surfaces. Symposium on Geometry Processing, 2015. (Type: Conference | Abstract | Links | BibTeX)@conference{229, <p>We present a method for joint analysis and synthesis of geometrically diverse 3D shape families. Our method first learns part-based templates such that an optimal set of fuzzy point and part correspondences is computed between the shapes of an input collection based on a probabilistic deformation model. In contrast to previous template-based approaches, the geometry and deformation parameters of our part-based templates are learned from scratch. Based on the estimated shape correspondence, our method also learns a probabilistic generative model that hierarchically captures statistical relationships of corresponding surface point positions and parts as well as their existence in the input shapes. A deep learning procedure is used to capture these hierarchical relationships. The resulting generative model is used to produce control point arrangements that drive shape synthesis by combining and deforming parts from the input collection. The generative model also yields compact shape descriptors that are used to perform fine-grained classification. Finally, it can be also coupled with the probabilistic deformation model to further improve shape correspondence. We provide qualitative and quantitative evaluations of our method for shape correspondence, segmentation, fine-grained classification and synthesis. Our experiments demonstrate superior correspondence and segmentation results than previous state-of-the-art approaches.</p> |
Mayberry, Addison; Tun, Yamin; Hu, Pan; SmithFreedman, Duncan; Ganesan, Deepak; Marlin, Benjamin; Salthouse, Christopher: CIDER: Enabling RobustnessPower Tradeoffs on a Computational Eyeglass. 2015. (Type: Proceedings | Abstract | Links | BibTeX)@proceedings{230, <p>The human eye offers a fascinating window into an individualtextquoterights health, cognitive attention, and decision making, but we lack the ability to continually measure these parameters in the natural environment. The challenges lie in: a) handling the complexity of continuous high-rate sensing from a camera and processing the image stream to estimate eye parameters, and b) dealing with the wide variability in illumination conditions in the natural environment. This paper explores the power--robustness tradeoffs inherent in the design of a wearable eye tracker, and proposes a novel staged architecture that enables graceful adaptation across the spectrum of real-world illumination. We propose, a system that operates in a highly optimized low-power mode under indoor settings by using a fast Search-Refine controller to track the eye, but detects when the environment switches to more challenging outdoor sunlight and switches models to operate robustly under this condition. Our design is holistic and tackles a) power consumption in digitizing pixels, estimating pupillary parameters, and illuminating the eye via near-infrared, b) error in estimating pupil center and pupil dilation, and c) model training procedures that involve zero effort from a user. We demonstrate that the system can estimate pupil center with error less than two pixels, and pupil diameter with error of one pixel (0.22mm). Our end-to-end results show that we can operate at power levels of roughly 7mW at a 4Hz eye tracking rate, or roughly 32mW at rates upwards of 250Hz.</p> |
Saleheen, Nazir; Ali, Amin; Hossain, Syed Monowar; Sarker, Hillol; Chatterjee, Soujanya; Marlin, Benjamin; Ertin, Emre; al textquoteright, Mustafa; Kumar, Santosh: puffMarker : A Multi-Sensor Approach for Pinpointing the Timing of First Lapse in Smoking Cessation. 2015. (Type: Proceedings | Abstract | Links | BibTeX)@proceedings{231, <p>Smoking is the leading cause of preventable deaths. Mobile technologies can help to deliver just-in-time-interventions to abstinent smokers and assist them in resisting urges to lapse. Doing so, however, it requires identification of high-risk situations that may lead an abstinent smoker to relapse. In this paper, we propose an explainable model for detecting smoking lapses in newly abstinent smokers using respiration and 6-axis inertial sensors worn on wrists. We propose a novel method by identifying windows of data that represent the hand at the mouth. We then develop a model to classify into puff or non-puff. On the training data, the model achieves a recall rate of 98%, for a FP rate of 1.5%. When the model is applied to the data collected from 13 abstainers, the false positive rate is 0.3/hour. Among 15 lapsers, the model is able to pinpoint the timing of first lapse in 13 participants. </p> |
2014 |
Natarajan, Annamalai; Gaiser, Edward; Angarita, Gustavo; Malison, Robert; Ganesan, Deepak; Marlin, Benjamin: Conditional Random Fields for Morphological Analysis of Wireless ECG Signals. Proceedings of the 5th Annual conference on Bioinformatics, Computational Biology and Health Informatics, ACM ACM, Newport Beach, CA, 2014. (Type: Conference | Abstract | Links | BibTeX)@conference{Natarajan-BCB2014, <p>Thanks to advances in mobile sensing technologies, it has recently become practical to deploy wireless electrocardiograph sensors for continuous recording of ECG signals. This capability has diverse applications in the study of human health and behavior, but to realize its full potential, new computational tools are required to effectively deal with the uncertainty that results from the noisy and highly non-stationary signals collected using these devices. In this work, we present a novel approach to the problem of extracting the morphological structure of ECG signals based on the use of dynamically structured conditional random field (CRF) models. We apply this framework to the problem of extracting morphological structure from wireless ECG sensor data collected in a lab-based study of habituated cocaine users. Our results show that the proposed CRF-based approach significantly out-performs independent prediction models using the same features, as well as a widely cited open source toolkit.</p> |
Mayberry, Addison; Hu, Pan; Marlin, Benjamin; Salthouse, Christopher; Ganesan, Deepak: iShadow: Design of a Wearable, Real-Time Mobile Gaze Tracker. 2014. (Type: Proceedings | Abstract | Links | BibTeX)@proceedings{133, <p>Continuous, real-time tracking of eye gaze is valuable in a variety of scenarios including hands-free interaction with the physical world, detection of unsafe behaviors, leveraging visual context for advertising, life logging, and others. While eye tracking is commonly used in clinical trials and user studies, it has not bridged the gap to everyday consumer use. The challenge is that a real-time eye tracker is a power-hungry and computation-intensive device which requires continuous sensing of the eye using an imager running at many tens of frames per second, and continuous processing of the image stream using sophisticated gaze estimation algorithms. Our key contribution is the design of an eye tracker that dramatically reduces the sensing and computation needs for eye tracking, thereby achieving orders of magnitude reductions in power consumption and form-factor. The key idea is that eye images are extremely redundant, therefore we can estimate gaze by using a small subset of carefully chosen pixels per frame. We instantiate this idea in a prototype hardware platform equipped with a low-power image sensor that provides random access to pixel values, a low-power ARM Cortex M3 microcontroller, and a bluetooth radio to communicate with a mobile phone. The sparse pixel-based gaze estimation algorithm is a multi-layer neural network learned using a state-of-the-art sparsity-inducing regularization function that minimizes the gaze prediction error while simultaneously minimizing the number of pixels used. Our results show that we can operate at roughly 70mW of power, while continuously estimating eye gaze at the rate of 30 Hz with errors of roughly 3 degrees.</p> |
Adams, Roy J; Sadasivam, Rajani S; Balakrishnan, Kavitha; Kinney, Rebecca L; Houston, Thomas K; Marlin, Benjamin M: PERSPeCT: Collaborative Filtering for Tailored Health Communications. Proceedings of the 8th ACM Conference on Recommender Systems, RecSys textquoteright14 ACM ACM, New York, NY, USA, 2014, ISBN: 978-1-4503-2668-1, (<p>n/a</p>). (Type: Conference | Abstract | Links | BibTeX)@conference{Adams-RecSys2014, <p>The goal of computer tailored health communications (CTHC) is to elicit healthy behavior changes by sending motivational messages personalized to individual patients. One prominent weakness of many existing CTHC systems is that they are based on expert-written rules and thus have no ability to learn from their users over time. One solution to this problem is to develop CTHC systems based on the principles of collaborative filtering, but this approach has not been widely studied. In this paper, we present a case study evaluating nine rating prediction methods for use in the Patient Experience Recommender System for Persuasive Communication Tailoring, a system developed for use in a clinical trial of CTHC-based smoking cessation support interventions.</p> |
Kae, Andrew; Learned-Miller, Erik; Marlin, Benjamin M: The Shape-Time Random Field for Semantic Video Labeling. 2014. (Type: Proceedings | Abstract | Links | BibTeX)@proceedings{134, <p>We propose a novel discriminative model for semantic labeling in videos by incorporating a prior to model both the shape and temporal dependencies of an object in video. A typical approach for this task is the conditional random field (CRF), which can model local interactions among adjacent regions in a video frame. Recent work [16, 14] has shown how to incorporate a shape prior into a CRF for improving labeling performance, but it may be difficult to model temporal dependencies present in video by using this prior. The conditional restricted Boltzmann machine (CRBM) can model both shape and temporal dependencies, and has been used to learn walking styles from motion- capture data. In this work, we incorporate a CRBM prior into a CRF framework and present a new state-of-the-art model for the task of semantic labeling in videos. In particular, we explore the task of labeling parts of complex face scenes from videos in the YouTube Faces Database (YFDB). Our combined model outperforms competitive baselines both qualitatively and quantitatively.</p> |
2013 |
Natarajan, Annamalai; Parate, Abhinav; Gaiser, Edward; Angarita, Gustavo; Malison, Robert; Marlin, Benjamin M; Ganesan, Deepak: Detecting cocaine use with wearable electrocardiogram sensors. UbiComp, 2013. (Type: Conference | Abstract | Links | BibTeX)@conference{DBLP:conf/huc/NatarajanPGAMMG13, <p>Ubiquitous physiological sensing has the potential to profoundly improve our understanding of human behavior, leading to more targeted treatments for a variety of disorders. The long term goal of this work is development of novel computational tools to support the study of addiction in the context of cocaine use. The current paper takes the first step in this important direction by posing a simple, but crucial question: Can cocaine use be reliably detected using wearable electrocardiogram (ECG) sensors? The main contributions in this paper include the presentation of a novel clinical study of cocaine use, the development of a computational pipeline for inferring morphological features from noisy ECG waveforms, and the evaluation of feature sets for cocaine use detection. Our results show that 32mg/70kg doses of cocaine can be detected with the area under the receiver operating characteristic curve levels above 0.9 both within and between-subjects.</p> |
Parate, Abhinav; Chiu, Meng-Chieh; Ganesan, Deepak; Marlin, Benjamin M: Leveraging graphical models to improve accuracy and reduce privacy risks of mobile sensing. MobiSys, 2013. (Type: Conference | Abstract | Links | BibTeX)@conference{DBLP:conf/mobisys/ParateCGM13, <p>The proliferation of sensors on mobile phones and wearables has led to a plethora of context classifiers designed to sense the individualtextquoterights context. We argue that a key missing piece in mobile inference is a layer that fuses the outputs of several classifiers to learn deeper insights into an individualtextquoterights habitual patterns and associated correlations between contexts, thereby enabling new systems optimizations and opportunities. In this paper, we design CQue, a dynamic bayesian network that operates over classifiers for individual contexts, observes relations across these outputs across time, and identifies opportunities for improving energy-efficiency and accuracy by taking advantage of relations. In addition, such a layer provides insights into privacy leakage that might occur when seemingly innocuous user context revealed to different applications on a phone may be combined to reveal more information than originally intended. In terms of system architecture, our key contribution is a clean separation between the detection layer and the fusion layer, enabling classifiers to solely focus on detecting the context, and leverage temporal smoothing and fusion mechanisms to further boost performance by just connecting to our higher-level inference engine. To applications and users, CQue provides a query interface, allowing a) applications to obtain more accurate context results while remaining agnostic of what classifiers/sensors are used and when, and b) users to specify what contexts they wish to keep private, and only allow information that has low leakage with the private context to be revealed. We implemented CQue in Android, and our results show that CQue can i) improve activity classification accuracy up to 42%, ii) reduce energy consumption in classifying social, location and activity contexts with high accuracy(>90%) by reducing the number of required classifiers by at least 33%, and iii) effectively detect and suppress contexts that reveal private information.</p> |
Parate, Abhinav; ö, Matthias B; Chu, David; Ganesan, Deepak; Marlin, Benjamin M: Practical prediction and prefetch for faster access to applications on mobile phones. UbiComp, 2013. (Type: Conference | Abstract | Links | BibTeX)@conference{DBLP:conf/huc/ParateBCGM13, <p>Mobile phones have evolved from communication devices to indispensable accessories with access to real-time content. The increasing reliance on dynamic content comes at the cost of increased latency to pull the content from the Internet before the user can start using it. While prior work has explored parts of this problem, they ignore the bandwidth costs of prefetching, incur significant training overhead, need several sensors to be turned on, and do not consider practical systems issues that arise from the limited background processing capability supported by mobile operating systems. In this paper, we make app prefetch practical on mobile phones. Our contributions are two-fold. First, we design an app prediction algorithm, APPM, that requires no prior training, adapts to usage dynamics, predicts not only which app will be used next but also when it will be used, and provides high accuracy without requiring additional sensor context. Second, we perform parallel prefetch on screen unlock, a mechanism that leverages the benefits of prediction while operating within the constraints of mobile operating systems. Our experiments are conducted on long-term traces, live deployments on the Android Play Market, and user studies, and show that we outperform prior approaches to predicting app usage, while also providing practical ways to prefetch application content on mobile phones.</p> |
Riedel, Sebastian; Yao, Limin; McCallum, Andrew; Marlin, Benjamin M: Relation Extraction with Matrix Factorization and Universal Schemas. HLT-NAACL, 2013. (Type: Conference | Abstract | Links | BibTeX)@conference{DBLP:conf/naacl/RiedelYMM13, <p>Traditional relation extraction predicts relations within some fixed and finite target schema. Machine learning approaches to this task require either manual annotation or, in the case of distant supervision, existing struc- tured sources of the same schema. The need for existing datasets can be avoided by using a universal schema: the union of all in- volved schemas (surface form predicates as in OpenIE, and relations in the schemas of pre- existing databases). This schema has an al- most unlimited set of relations (due to surface forms), and supports integration with existing structured data (through the relation types of existing databases). To populate a database of such schema we present matrix factorization models that learn latent feature vectors for en- tity tuples and relations. We show that such latent models achieve substantially higher accuracy than a traditional classification approach. More importantly, by operating simultaneously on relations observed in text and in pre-existing structured DBs such as Freebase, we are able to reason about unstructured and structured data in mutually-supporting ways. By doing so our approach outperforms state-of-the-art distant supervision.</p> |