About the Lab
The MLDS lab focused on the development of machine learning models and algorithms for addressing a variety of challenging problems in the areas of computational social science, computational ecology, computational behavioral science and computational medicine.
Xue, S; Fern, A; Sheldon, D
Scheduling Conservation Designs via Network Cascade Optimization Conference
Twenty-Sixth AAAI Conference on Artificial Intelligence, 2012, (<p>n/a</p>).
<p>We introduce the problem of scheduling land purchases to conserve an endangered species in a way that achieves maxi- mum population spread but delays purchases as long as possible, so that conservation planners retain maximum flexibility and use available budgets in the most efficient way. We develop the problem formally as a stochastic optimization problem over a network cascade model describing the population spread, and present a solution approach that reduces the stochastic problem to a novel variant of a Steiner tree problem. We give a primal-dual algorithm for the problem that computes both a feasible solution and a bound on the quality of an optimal solution. Our experiments, using actual conservation data and a standard diffusion model, show that the approach produces near optimal results and is much more scalable than more generic off-the-shelf optimizers.</p>
Krafft, Peter; Moore, Juston; Wallach, Hanna; Desmarais, Bruce
Topic-Partitioned Multinetwork Embeddings Conference
Advances in Neural Information Processing Systems Twenty-Five, Lake Tahoe, NV, 2012.
<p>We introduce a joint model of network content and context designed for exploratory analysis of email networks via visualization of topic-specific communication patterns. Our model is an admixture model for text and network attributes that uses multinomial distributions over words as admixture components for explaining email text and latent Euclidean positions of actors as admixture components for explaining email recipients. This model allows us to infer topics of communication, a partition of the overall network into topic-specific subnetworks, and two-dimensional visualizations of those subnetworks. We validate the appropriateness of our model by achieving state-of-the-art performance on a prediction task and semantic coherence comparable to that of latent Dirichlet allocation. We demonstrate the capability of our model for descriptive, explanatory, and exploratory analysis by investigating the inferred topic-specific communication patterns of a new email data set, the New Hanover County email corpus.</p>
Marlin, Benjamin M; Kale, David C; Khemani, Robinder G; Wetzel, Randall C
Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium, IHI textquoteright12 ACM ACM, New York, NY, USA, 2012, ISBN: 978-1-4503-0781-9.
<p>Bedside clinicians routinely identify temporal patterns in physiologic data in the process of choosing and administering treatments intended to alter the course of critical illness for individual patients. Our primary interest is the study of unsupervised learning techniques for automatically uncovering such patterns from the physiologic time series data contained in electronic health care records. This data is sparse, high-dimensional and often both uncertain and incomplete. In this paper, we develop and study a probabilistic clustering model designed to mitigate the effects of temporal sparsity inherent in electronic health care records data. We evaluate the model qualitatively by visualizing the learned cluster parameters and quantitatively in terms of its ability to predict mortality outcomes associated with patient episodes. Our results indicate that the model can discover distinct, recognizable physiologic patterns with prognostic significance.</p>