Publications Search – Knowledge Discovery Lab

Ozgur Simsek, David Jensen

Navigating networks by using homophily and degree Journal Article

In: Proceedings of the National Academy of Sciences, vol. 105, no. 35, pp. 12758–12762, 2008.

Abstract | Links | BibTeX | Tags: Navigation and Routing in Networks

Amy McGovern, David Jensen

Optimistic pruning for multiple instance learning Journal Article

In: Pattern recognition letters, vol. 29, no. 9, pp. 1252–1260, 2008.

Abstract | Links | BibTeX | Tags:

Michael Hay, Gerome Miklau, David Jensen, Don Towsley, Philipp Weis

Resisting structural re-identification in anonymized social networks Journal Article

In: Proceedings of the VLDB Endowment, vol. 1, no. 1, pp. 102–114, 2008.

Abstract | Links | BibTeX | Tags: Privacy and Networks

Andrew Fast, David Jensen

Why stacked models perform effective collective classification Proceedings Article

In: Proceedings of the 8th IEEE International Conference on Data Mining (ICDM 2008), December 15-19, 2008, Pisa, Italy, pp. 785–790, IEEE Computer Society, 2008.

Abstract | Links | BibTeX | Tags: Statistical Relational Learning

Michael Hay, Gerome Miklau, David Jensen, Philipp Weis, Siddharth Srivastava

Anonymizing social networks Journal Article

In: Computer science department faculty publication series, pp. 180, 2007.

Abstract | Links | BibTeX | Tags:

David Jensen

Beyond Prediction: Directions for Probabilistic and Relational Learning Proceedings Article

In: Inductive Logic Programming, 17th International Conference, ILP 2007, Corvallis, OR, USA, June 19-21, 2007, Revised Selected Papers, pp. 4–21, Springer, 2007.

Abstract | Links | BibTeX | Tags:

Jennifer Neville, David Jensen

Bias/Variance Analysis for Relational Domains Proceedings Article

In: Inductive Logic Programming, 17th International Conference, ILP 2007, Corvallis, OR, USA, June 19-21, 2007, Revised Selected Papers, pp. 27–28, Springer, 2007.

Abstract | Links | BibTeX | Tags:

Matthew Rattigan, Marc Maier, David Jensen, Bin Wu, Xin Pei, Jianbin Tan, Yi Wang

Exploiting Network Structure for Active Inference in Collective Classification Proceedings Article

In: Workshops Proceedings of the 7th IEEE International Conference on Data Mining (ICDM 2007), October 28-31, 2007, Omaha, Nebraska, USA, pp. 429–434, IEEE Computer Society, 2007.

Abstract | Links | BibTeX | Tags:

Lisa Friedland, David Jensen

Finding tribes: identifying close-knit individuals from employment patterns Proceedings Article

In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, California, USA, August 12-15, 2007, pp. 290–299, ACM, 2007.

Abstract | Links | BibTeX | Tags:

Matthew Rattigan, Marc Maier, David Jensen

Graph clustering with network structure indices Proceedings Article

In: Machine Learning, Proceedings of the Twenty-Fourth International Conference (ICML 2007), Corvallis, Oregon, USA, June 20-24, 2007, pp. 783–790, ACM, 2007.

Abstract | Links | BibTeX | Tags:

Trevor Strohman, W. Bruce Croft, David Jensen

Recommending citations for academic papers Proceedings Article

In: SIGIR 2007: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Amsterdam, The Netherlands, July 23-27, 2007, pp. 705–706, ACM, 2007.

Abstract | Links | BibTeX | Tags:

Andrew Fast, Lisa Friedland, Marc Maier, Brian Taylor, David Jensen, Henry G. Goldberg, John Komoroske

Relational data pre-processing techniques for improved securities fraud detection Proceedings Article

In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, California, USA, August 12-15, 2007, pp. 941–949, ACM, 2007.

Abstract | Links | BibTeX | Tags:

@inproceedings{DBLP:conf/kdd/FastFMTJGK07,

title = {Relational data pre-processing techniques for improved securities 

 fraud detection},

author = {Andrew Fast and Lisa Friedland and Marc Maier and Brian Taylor and David Jensen and Henry G. Goldberg and John Komoroske},

url = {https://doi.org/10.1145/1281192.1281293},

doi = {10.1145/1281192.1281293},

year  = {2007},

date = {2007-01-01},

booktitle = {Proceedings of the 13th ACM SIGKDD International Conference on 

 Knowledge Discovery and Data Mining, San Jose, California, USA, August 

 12-15, 2007},

pages = {941--949},

publisher = {ACM},

abstract = {Commercial datasets are often large, relational, and dynamic. They contain many records of people, places, things, events and their interactions over time. Such datasets are rarely structured appropriately for knowledge discovery, and they often contain variables whose meanings change across different subsets of the data. We describe how these challenges were addressed in a collaborative analysis project undertaken by the University of Massachusetts Amherst and the National Association of Securities Dealers(NASD). We describe several methods for data pre-processing that we applied to transform a large, dynamic, and relational dataset describing nearly the entirety of the U.S. securities industry, and we show how these methods made the dataset suitable for learning statistical relational models. To better utilize social structure, we first applied known consolidation and link formation techniques to associate individuals with branch office locations. In addition, we developed an innovative technique to infer professional associations by exploiting dynamic employment histories. Finally, we applied normalization techniques to create a suitable class label that adjusts for spatial, temporal, and other heterogeneity within the data. We show how these pre-processing techniques combine to provide the necessary foundation for learning high-performing statistical models of fraudulent activity.},

keywords = {},

pubstate = {published},

tppubtype = {inproceedings}

}

Close

Jennifer Neville, David Jensen

Relational Dependency Networks Journal Article

In: J. Mach. Learn. Res., vol. 8, pp. 653–692, 2007.

Abstract | Links | BibTeX | Tags: Statistical Relational Learning

Michael Hay, Andrew Fast, David Jensen

Understanding the effects of search constraints on structure learning Journal Article

In: U Mass. Amherst CS, Tech. Rep, pp. 07–21, 2007.

Abstract | Links | BibTeX | Tags:

Aaron M Ellison, Leon J Osterweil, Lori Clarke, Julian L Hadley, Alexander Wise, Emery Boose, David R Foster, Allen Hanson, David Jensen, Paul Kuzeja, others

Analytic webs support the synthesis of ecological data sets Journal Article

In: Ecology, vol. 87, no. 6, pp. 1345–1358, 2006.

Abstract | Links | BibTeX | Tags:

@article{ellison2006analytic,

title = {Analytic webs support the synthesis of ecological data sets},

author = {Aaron M Ellison and Leon J Osterweil and Lori Clarke and Julian L Hadley and Alexander Wise and Emery Boose and David R Foster and Allen Hanson and David Jensen and Paul Kuzeja and others},

url = {https://esajournals.onlinelibrary.wiley.com/doi/pdfdirect/10.1890/0012-9658%282006%2987%5B1345%3AAWSTSO%5D2.0.CO%3B2},

year  = {2006},

date = {2006-01-01},

journal = {Ecology},

volume = {87},

number = {6},

pages = {1345--1358},

publisher = {Wiley Online Library},

abstract = {A wide variety of data sets produced by individual investigators are now synthesized to address ecological questions that span a range of spatial and temporal scales. It is important to facilitate such syntheses so that "consumers" of data sets can be confident that both input data sets and synthetic products are reliable. Necessary documentation to ensure the reliability and validation of data sets includes both familiar descriptive metadata and formal documentation of the scientific processes used (i.e., process metadata) to produce usable data sets from collections of raw data. Such documentation is complex and difficult to construct, so it is important to help "producers" create reliable data sets and to facilitate their creation of required metadata. We describe a formal representation, an "analytic web," that aids both producers and consumers of data sets by providing complete and precise definitions of scientific processes used to process raw and derived data sets. The formalisms used to define analytic webs are adaptations of those used in software engineering, and they provide a novel and effective support system for both the synthesis and the validation of ecological data sets. We illustrate the utility of an analytic web as an aid to producing synthetic data sets through a worked example: the synthesis of long-term measurements of whole-ecosystem carbon exchange. Analytic webs are also useful validation aids for consumers because they support the concurrent construction of a complete, Internet-accessible audit trail of the analytic processes used in the synthesis of the data sets. Finally we describe our early efforts to evaluate these ideas through the use of a prototype software tool, SciWalker. We indicate how this tool has been used to create analytic webs tailored to specific data-set synthesis and validation activities, and suggest extensions to it that will support additional forms of validation. The process metadata created by SciWalker is readily adapted for inclusion in Ecological Metadata Language (EML) files.},

keywords = {},

pubstate = {published},

tppubtype = {article}

}

Close

A wide variety of data sets produced by individual investigators are now synthesized to address ecological questions that span a range of spatial and temporal scales. It is important to facilitate such syntheses so that "consumers" of data sets can be confident that both input data sets and synthetic products are reliable. Necessary documentation to ensure the reliability and validation of data sets includes both familiar descriptive metadata and formal documentation of the scientific processes used (i.e., process metadata) to produce usable data sets from collections of raw data. Such documentation is complex and difficult to construct, so it is important to help "producers" create reliable data sets and to facilitate their creation of required metadata. We describe a formal representation, an "analytic web," that aids both producers and consumers of data sets by providing complete and precise definitions of scientific processes used to process raw and derived data sets. The formalisms used to define analytic webs are adaptations of those used in software engineering, and they provide a novel and effective support system for both the synthesis and the validation of ecological data sets. We illustrate the utility of an analytic web as an aid to producing synthetic data sets through a worked example: the synthesis of long-term measurements of whole-ecosystem carbon exchange. Analytic webs are also useful validation aids for consumers because they support the concurrent construction of a complete, Internet-accessible audit trail of the analytic processes used in the synthesis of the data sets. Finally we describe our early efforts to evaluate these ideas through the use of a prototype software tool, SciWalker. We indicate how this tool has been used to create analytic webs tailored to specific data-set synthesis and validation activities, and suggest extensions to it that will support additional forms of validation. The process metadata created by SciWalker is readily adapted for inclusion in Ecological Metadata Language (EML) files.

Close

Jennifer Neville, David Jensen

Bias/variance analysis for network data Proceedings Article

In: Proceedings of the Workshop on Statistical Relational Learning, 23rd International Conference on Machine Learning, 2006.

Abstract | Links | BibTeX | Tags:

Hendrik Blockeel, David Jensen, Stefan Kramer

Introduction to the special issue on multi-relational data mining and statistical relational learning Journal Article

In: Mach. Learn., vol. 62, no. 1-2, pp. 3–5, 2006.

Links | BibTeX | Tags:

John Burgess, Brian Gallagher, David Jensen, Brian Neil Levine

MaxProp: Routing for Vehicle-Based Disruption-Tolerant Networks Proceedings Article

In: INFOCOM 2006. 25th IEEE International Conference on Computer Communications, Joint Conference of the IEEE Computer and Communications Societies, 23-29 April 2006, Barcelona, Catalunya, Spain, IEEE, 2006.

Abstract | Links | BibTeX | Tags: Navigation and Routing in Networks

Chirag Shah, W. Bruce Croft, David Jensen

Representing documents with named entities for story link detection (SLD) Proceedings Article

In: Proceedings of the 2006 ACM CIKM International Conference on Information and Knowledge Management, Arlington, Virginia, USA, November 6-11, 2006, pp. 868–869, ACM, 2006.

Abstract | Links | BibTeX | Tags:

Andrew Fast, David Jensen

The NFL Coaching Network: Analysis of the Social Network among Professional Football Coaches Proceedings Article

In: Capturing and Using Patterns for Evidence Detection, Papers from the 2006 AAAI Fall Symposium, Washington, DC, USA, October 13-15, 2006, pp. 112–119, AAAI Press, 2006.

Abstract | Links | BibTeX | Tags:

Search Google Appliance