Proofs Should Repair Themselves

In his 1900 book The Wonderful Wizard of Oz, L. Frank Baum tells a story that will resonate with any software engineer. A woodman by the name of Nick Chopper suffers a series of workplace accidents. In turn, his arms, legs, body, and even his head are replaced by metal prosthetics. Eventually, what remains is an entirely different man made of tin.  

Like Baum’s Tin Man, all software projects are repaired and reconfigured, and a project can transform by degrees into something entirely unlike its starting point. As software engineers have long known, this shift of code and specification is the norm. Studies have shown that maintenance consumes the majority of resources in large-scale projects1, and many systems are built on the bones of previous projects. 

In recent years, many companies have begun to use formal proofs to provide assurance for software. For example, AWS now formally verifies significant portions of its s2n library, which provides cryptographic security for Amazon.com, Alexa, and more. Proofs are used for assurance at Facebook, Google, Intel, Apple, and others. For these companies, proof engineering is software engineering. As software evolves, proofs of that software must evolve in kind.  

A key method for success at these companies has been a close alignment between proof tools and continuous integration (CI) development environments. This linkage between proofs and code means that edits to the code can be rechecked within seconds in order to re-establish assurance of the software as it changes. However, experience shows this linkage is fragile. Automated tools can often re-establish a proof if code changes are small. But if the code is restructured beyond a certain point, proof tools are typically unable to re-establish a proof of correctness, even if the software contains no defects. Instead, the internal scaffolding of the proof must be rebuilt by hand by a formal methods expert. Consequently, proofs are tended and maintained by embedded proof experts who work alongside regular software engineers. 

This approach cannot scale, and therefore proof repair is a scalability issue. If proofs require regular repair by experts, there is an inherent limit on how widely they can be deployed. Formal methods experts are in short supply, and the time and cost involved in the manual reconstruction of proofs is considerable. For the foreseeable future, human expertise will continue to be necessary in constructing formal proofs, but we should aim for the effort of maintaining a proof to be comparable to the effort of maintaining the software. If we are to reach a world where proofs are a routine part of software engineering, then moderate code changes should result in moderate — and automated — proof repairs. 

We have reason to think such proof repair is tractable. Rather than trying to synthesize a complete proof from nothing — a problem known to be immensely difficult — we start from a correct proof of fairly similar software. We will be attempting proof reconstruction within a known neighborhood.

Successful proof repair is likely to require a combination of different strategies. For example:

  • Local repair search. Many properties in a proof are local to a particular piece of code, meaning that a code change may only invalidate a small portion of the proof. Often, proof tools do not consider this locality and perform a potentially exponential analysis of all the code. By the same token, when proofs fail, current proof tools do not do a good job of localizing the source of the failure in the program. In many cases, it will be possible to adapt proof tools to search locally for repairs to existing proofs. When a proof requires repair, the tool would identify the minimal slice of code and proof affected and then search for a local fix to re-establish assurance.
  • Change-resistant proofs. Formal proofs represent two kinds of properties: fundamental properties necessary to assert correct behavior, and accidental properties that reflect the structure of the code. Most current proofs intermingle these concerns and become inflexible in the face of changes in the code. Proofs should be designed reducing the coupling between the fundamental properties and the accidental. When proofs express such distinctions, it will be more feasible to identify opportunities for repair in the structure of the proof itself.
  • Transfer learning between proofs. Proofs, like programs, are very diverse, but repairs often follow a small set of patterns. Current proof tools operate on the current code alone, independent of any other patterns. The new research area of Big Code has identified approaches for applying machine learning to code updates by observing code use patterns and corresponding properties. Where such patterns arise again, even in quite different applications, the proof tool will obtain guidance (akin to human insight) regarding techniques to try to affect the proof repair. Learning from other uses creates a virtuous cycle, where successful proof repairs add to the examples driving learning.
  • Understandable proof repairs. Some repairs can be applied automatically, but some will require input from humans even with local code changes. Currently, proof tools require formal methods expertise. We believe it is possible to create a path whereby the software developers — who intuitively understand the intent of their code — can interact with the proof repair tools. Tools should propose possible repairs in terms that make sense to software developers, including explanations of the ramifications of change for the overall assurance results. The tools may go as far as to demonstrate that a proof cannot be repaired, and suggest possible program repairs instead.

An increasing number of real-world software projects now have accompanying proofs. Some of these proofs are already integrated into software engineering pipelines. This represents both a risk and an opportunity for proof engineering as a field. No matter the assurance gains that are possible from proofs, if they can’t be maintained, they will quickly be set aside. We must quickly develop techniques that can deal with change, which is ubiquitous in real software. 

Nonetheless, there is reason for optimism. Automated proof repair is little studied, and (as I have laid out above) there are several possible avenues for success. Many of the current commercial proofs are open source, with complete changelogs stored in their repositories. This represents a rich set of test data with which we can refine our ideas. The time is right for ambitious research to solve this problem, and we are just getting started. 

Thanks to Byron Cook, John Launchbury, and Mike Whalen for comments on this post. 

1For a survey of research, see “Software Maintenance Costs”, by Jussi Koskinen. https://wiki.uef.fi/display/tktWiki/Jussi+Koskinen