Automated Formal Verification
Formally verifying system properties is one of the most effective ways of improving system quality, but its high manual effort requirements often render it prohibitively expensive. LASER works on automating formal verification by modeling existing proofs and using these models to synthesize new proofs. Our tools, including Passport, Diva, and TacTok, are effective because of the richness of the data the proofs contain and the powerful logical systems beneath proof assistants. Our lab pushes on exploiting more of these aspects of formal proofs to build better, more powerful systems that fully automate formal verification. You can read about Passport, Diva (winner of ICSE’22 ACM SIGSOFT Distinguished Paper award), and TacTok, or watch a video demo of Proofster, our web-based proof-synthesis interface, or check out our latest work on Baldur, using large language models to synthesize proofs!
Fairness in Machine Learning
Software, today, increasingly uses machine learning and artificial intelligence components. Providing guarantees on these components, such as that they do not discriminate, that they do not hurt humans, or that they outperform other components is difficult. In a 2019 Science paper, we present a framework for creating Seldonian machine learning algorithms that provide just such high-probability guarantees. Importantly, Seldonian algorithms allow their users — domain experts in their application domains — to specify the requirements that need to be guaranteed, from what does it mean to be fair in that context, to what it means to cause harm, to what it means to outperform other solutions. We create several Seldonian algorithms to predict student success from test scores while guaranteeing the solution does not discriminate based on gender (while state-of-the-art solution exhibit sexist behavior) and to learn new diabetes treatment policies that guarantee to cause no more hyperglycemia than existing solutions. Learn more, watch, read, and read.
Bugs in software are incredibly costly, but so is software debugging. More bugs are reported daily than developers can handle. Automatic program repair is an exciting approach, but can we trust automatically generated patches? Automated program repair research has produced dozens of repair techniques that use a buggy program and a partial specification of that program, e.g., a set of tests describing the desired behavior. Due to the partial nature of the specification, patches that satisfy the written specification may not satisfy the intended, unwritten, full specification. The goal of this project is to develop benchmarks and tools to evaluate the quality of repair, and to improve the repair quality. We have developed an objective measure of repair quality that uses an independent specification — tests that are not part of the repair process — and found that existing techniques often produce low-quality repairs that break existing (undertested) functionality (read) using our new benchmark for repair quality (read). Next, we developed SearchRepair, a technique that replaces likely buggy code with publicly available human-written code to repair defects. SearchRepair works at a higher level of granularity than prior repair techniques, and it produces repairs of much higher quality (though making SearchRepair scale to the size programs prior techniques can repair is an ongoing effort). For example, in small C programs, SearchRepair generated repairs that pass 97.2% of independent tests, whereas prior techniques’ repairs passed no more than 72.1% (read).
This project is funded by the NSF. The tools and benchmarks are all open-source and publicly available. Collaborators include or have included Ted Smith, Manish Motwani, Afsoon Afzal, Mauricio Soto, Yalin Ke, Neal Holts, Sasha Medvidovic, Claire Le Goues, Kathryn T. Stolee, Prem Devanbu, Stephanie Forrest, and Westley Weimer.