Safe and Fair Machine Learning

In this project we study how the user of a machine learning (ML) algorithm (method) can place constraints on the algorithm’s behavior. We contend that standard ML algorithms are not user-friendly, in that they can require ML and data science expertise to apply responsibly to real-world applications. We present a new type of ML algorithm that shifts, from the user of the algorithm to the researcher who designs the algorithm, many of the challenges associated with ensuring that the ML method is safe to use. The resulting algorithms provide a simple interface for specifying what constitutes undesirable behavior of the ML algorithm, and provide high-probability guarantees that it will not produce this undesirable behavior.

Continue reading “Safe and Fair Machine Learning”

Data Diversity

The big data revolution and advancements in machine learning technologies have revolutionized decision making, advertising, medicine, and even election campaigns. Yet, data is an imperfect medium, often tainted by skews and biases. Learning systems and analysis software learn and amplify these biases. As a result, discrimination shows up in many data-driven applications, such as advertisements, hotel bookings, image search, and vendor services. Since data skew is often a cause of algorithmic bias, the ability to retrieve balanced, diverse datasets can mitigate the underlying problem. Diversification also has usability implications, as it allows us to produce representative samples of a dataset that are small enough for human consumption. Our research focuses on developing methods for producing appropriately diverse subsets of given datasets efficiently and scalably, aiming to alleviate biases in the underlying data and to facilitate user-facing data exploration systems. Continue reading “Data Diversity”