GenProg’s decade-defining developments in debugging

In This Story

George Mason University Department of Computer Science Associate Professor ThanhVu H. Nguyen won the IEEE Transactions on Software Engineering (TSE) 50th Anniversary Most Influential Paper Award for his contributions to GenProg, a groundbreaking system that demonstrated—for the first time at scale—the feasibility of automatically repairing real-world software defects.

This award marks Nguyen’s third major test-of-time distinction on the topic of automatic program repair (APR) The other two were the ACM SIGEVO (the Association for Computing Machinery Special Interest Group on Genetic and Evolutionary Computation) 10-Year Impact Award and ACM SIGSOFT/IEEE TCSE ICSE (ACM Special Interest Group on Software Engineering/Institute of Electrical and Electronics Engineers Technical Community on Software Engineering International Conference on Software Engineering) 10-Year Most Influential Paper Award, both from 2019.

Receiving three awards from premier software engineering venues places GenProg among the most influential contributions in the field in the past decade. This recognition reflects the lasting influence of Nguyen’s earlier research, which paved the way for his later work on the safety and reliability of AI systems.

The original GenProg research, developed with collaborators Claire Le Goues, Westley Weimer, and Stephanie Forrest in 2009, challenged long-held assumptions about the fragility of software systems and the necessity of human-written patches. “The idea was, bugs are reported to developers, who are humans with other priorities, and sometimes the bugs take a long time to get addressed,” said Nguyen. “Even when there is time to look at it, fixing bugs is very difficult.”

Four people with conference badges pose for a picture near a street intersection. — Nguyen, second from left, and colleagues at the conference where they first presented GenProg. Photo provided.

Instead of treating debugging as a manual activity, the team proposed using evolutionary search, modeled after biological mutation and selection, to automatically generate patches for large, real-world programs. GenProg automatically repaired real-world programs with hundreds of thousands of lines of code, at a time when most prior work focused on tiny, contrived examples. In addition, GenProg produces patches that often matched or rivaled those written by human developers.

GenProg, introduced three core innovations: identifying execution paths likely responsible for the bug, narrowing the search space from millions of lines to a few regions; applying realistic edit operations like deleting statements, replacing blocks, or reusing existing code fragments elsewhere in the project; and having a candidate eliminate the failing behavior and preserve all previously correct behavior, as measured by the program’s regression test suite.

Nguyen said the results generated controversy. “The idea of machines replacing and outperforming human programmers in a creative task like debugging was radical, and many were skeptical, and it sparked heated debates,” he said. Multiple follow-up studies analyzed and improved patch quality, establishing a robust automatic program repair research ecosystem. GenProg also foreshadowed the current era of AI and large language models.

For 15 years, GenProg was embraced by academia, industry, and government. Meta and others cited GenProg’s concepts when deploying production-scale automated repair frameworks. Within search-based software engineering, GenProg inspired a large body of work on repair templates and patch synthesis (assembling or generating a specific fix employing reusable patterns). And it helped show, early on, that computers can realistically search for and create code fixes on their own, even in large, real-world software systems.

Nguyen's recent work tackles a parallel challenge in modern AI and ML systems: verifying that deep neural networks behave as intended. This effort produced NeuralSAT, a framework that analyzes neural networks and certifies their behavior. By pushing automated reasoning into a domain long treated as opaque, the NeuralSATproject, which has gained significant traction in the AI community and receives substantial funding the National Science Foundation, Amazon, and Nvidia, challenges the common belief that neural networks are too complex to analyze formally and aims to ensure that AI systems can be trusted in high-stakes environments like autonomous vehicles, healthcare, and finance.

Nguyen's said GenProg and NeuralSAT “Exemplify my research style of pursuing high-risk, high-reward questions challenging conventional wisdom and open new technical directions in the new era of AI and software engineering. The supportive environment at George Mason, particularly in the Computer Science Department and the College of Engineering and Computing, fosters a culture of bold, field-shaping research that allows faculty to pursue ambitious ideas with real-world impact.”

Topics

Computer Engineering

software engineering

debugging