Proof Synthesis

Formally verifying system properties is one of the most effective ways of improving system quality, but its high manual effort requirements often render it prohibitively expensive. Tools that automate formal verification, by learning from proof corpora to suggest proofs, have just begun to show their promise. These tools are effective because of the richness of the data the proof corpora contain. This richness comes from the stylistic conventions followed by communities of proof developers, together with the powerful logical systems beneath proof assistants. However, this richness remains underexploited, with most work thus far focusing on architecture rather than on how to make the most of the proof data.

Passport is a project studying new ways of encoding identifiers for neural models of code in the context of learning-guided foundational verification. It proposes three novel encoding mechanisms for identifiers: a method of indexing common identifiers while storing the category of every identifier (local variable, definition, or constructor), an encoding of the subwords which make up the identifier name, and an encoding of the file and module locations where the identifiers are defined. Passport is free and open source software, and you can find the pre-print paper draft on arXiv.

Program Repair

Bugs in software are incredibly costly, but so is software debugging. More bugs are reported daily than developers can handle. Automatic program repair is an exciting approach, but can we trust automatically generated patches? Automated program repair research has produced dozens of repair techniques that use a buggy program and a partial specification of that program, e.g., a set of tests describing the desired behavior. Due to the partial nature of the specification, patches that satisfy the written specification may not satisfy the intended, unwritten, full specification. The goal of this project is to develop benchmarks and tools to evaluate the quality of repair, and to improve the repair quality. We have developed an objective measure of repair quality that uses an independent specification — tests that are not part of the repair process — and found that existing techniques often produce low-quality repairs that break existing (undertested) functionality (read) using our new benchmark for repair quality (read). Next, we developed SearchRepair, a technique that replaces likely buggy code with publicly available human-written code to repair defects. SearchRepair works at a higher level of granularity than prior repair techniques, and it produces repairs of much higher quality (though making SearchRepair scale to the size programs prior techniques can repair is an ongoing effort). For example, in small C programs, SearchRepair generated repairs that pass 97.2% of independent tests, whereas prior techniques’ repairs passed no more than 72.1% (read).

This project is funded by the NSF. The tools and benchmarks are all open-source and publicly available. Collaborators include or have included Ted Smith, Manish Motwani, Afsoon Afzal, Mauricio Soto, Yalin Ke, Neal Holts, Sasha Medvidovic, Claire Le Goues, Kathryn T. Stolee, Prem Devanbu, Stephanie Forrest, and Westley Weimer.

Fairness in Machine Learning

Software, today, increasingly uses machine learning and artificial intelligence components. Providing guarantees on these components, such as that they do not discriminate, that they do not hurt humans, or that they outperform other components is difficult. In a 2019 Science paper, we present a framework for creating Seldonian machine learning algorithms that provide just such high-probability guarantees. Importantly, Seldonian algorithms allow their users — domain experts in their application domains — to specify the requirements that need to be guaranteed, from what does it mean to be fair in that context, to what it means to cause harm, to what it means to outperform other solutions. We create several Seldonian algorithms to predict student success from test scores while guaranteeing the solution does not discriminate based on gender (while state-of-the-art solution exhibit sexist behavior) and to learn new diabetes treatment policies that guarantee to cause no more hyperglycemia than existing solutions. Learn more, watch, read, and read.

Collaborators include Blossom Metevier, Stephen Giguere, Sarah Brockman, Phil Thomas, Emma Brunskill, Bruno Castro da Silva, and Andrew Barto.