Proof Repair and Code Generation

Proofs are our bread and butter at Galois – we apply proofs to many different assurance problems, from compiler correctness to hardware design. Proofs and the theorem proving technologies that apply them are very powerful, but that power comes with a cost. In our experience, proofs can be difficult to maintain over time as systems change. We’ve already argued on this very blog our view that proofs should repair themselves

That’s why I was excited to help Talia Ringer, back then a graduate student at the University of Washington, with a case study in automatically repairing proofs. Talia has a new paper on this project at PLDI 2021, called Proof Repair Across Type Equivalences. I strongly recommend you read it! This blog post is a kind of sidebar explaining how Talia’s technique applied in practice for us at Galois. We will see how proof repair techniques can help with the automated discovery of computable equivalences between machine-generated software artifacts and human-engineered abstractions used in theorem proving.

A bit of context is needed. In early 2019, I was tasked with creating a Coq backend to our internal representation language for imperative programs and their formal specifications, dubbed the “SAW core” language. The SAWScript tool allows us to create formal models of programs written in imperative languages such as C, Java, as well as our own specification language Cryptol. It allows us to verify properties of those programs by dispatching proof obligations to automated solvers.

Cryptol is a domain-specific language aimed at specifying cryptographic algorithms in a fashion closer to mathematical definitions than the usual imperative implementations. This gives us implementations of cryptographic specifications that are as readable as the reference implementations that often appear in publications while at the same time executable and mechanically specified. Cryptol uses off-the-shelf solvers in two interesting ways: for type-checking its terms, which involve type-level natural numbers to make sure all operations are properly sized, and for actual theorem proving, which can be initiated by the user via some commands. 

But we often run into properties whose proofs are either extremely long to process or even too complicated for solvers to handle. In those cases, having an escape hatch to a proof assistant is valuable in many ways: it can allow us to prove properties when the solvers fail, it can reduce checking time (trading engineering effort for proof computation), and finally, manually attempting proofs can lead us to either discovering interesting properties that can help the solvers or discovering flaws in the properties we were trying to prove that can point to actual bugs in our specifications or implementation!

When running Cryptol programs through SAW, they are translated into our internal representation, the SAW Core language. It is a dependently-typed language with support for records, some literals, and the definition of inductive data types. Since this work was initially quite experimental, I tried to build a minimal viable version of the translation to a proof assistant. Fortunately, most of the core language can be translated in a very straightforward fashion to some isomorphic Coq datatype. With this, we can generate Coq code for our small standard library, as well as client cryptographic code written in Cryptol.

The curse of generated code

Even though SAW core has support for records since most solver backends did not support them, our Cryptol frontend would desugar Cryptol records into SAW Core tuples (to be precise, right-nested pair types, since SAW Core only has a pair type). While SMT backends are completely happy handling such large amounts of unlabeled data, the experience was much more traumatic to a human user of our Coq backend. Consider the following Cryptol code:

s2n_cork : connection -> [2]
s2n_cork c = c.corked + 1

This is a perfectly innocuous function that accesses a 2-bit field of a structure named connection and returns its 1-increment. How bad could this look through our translation?

Definition s2n_cork (c : ((prod) (@SAWCoreScaffolding.Bool) (((prod)
(((@CryptolPrimitives.seq) (((@TCNum) (2))) 
(@SAWCoreScaffolding.Bool)))
(((prod) (((@CryptolPrimitives.seq) (((@TCNum) (8)))
(@SAWCoreScaffolding.Bool))) (((prod) (((prod) 
(((@CryptolPrimitives.seq)
(((@TCNum) (32))) (@SAWCoreScaffolding.Bool))) 
(((@CryptolPrimitives.seq)
(((@TCNum) (32))) (@SAWCoreScaffolding.Bool))))) (((prod)
(@SAWCoreScaffolding.Bool) (((prod) (@SAWCoreScaffolding.Bool) 
(((prod)
(((@CryptolPrimitives.seq) (((@TCNum) (32))) 
(@SAWCoreScaffolding.Bool)))
(((prod) (@SAWCoreScaffolding.Bool) 
(@SAWCoreScaffolding.Bool)))))))))))))))))
:= ((@CryptolPrimitives.ecPlus) (((@CryptolPrimitives.seq) (((@TCNum) 
(2)))
(@SAWCoreScaffolding.Bool))) (((@CryptolPrimitives.PArithSeqBool) 
(((@TCNum)
(2))))) (((fst) (((snd) (c))))) (((@CryptolPrimitives.ecNumber) 
(((@TCNum)
(1))) (((@CryptolPrimitives.seq) (((@TCNum) (2))) 
(@SAWCoreScaffolding.Bool)))
(((@CryptolPrimitives.PLiteralSeqBool) (((@TCNum) (2)))))))).

We’re going to need to crank up the “oof size” on this one. How did it get so messy? The first large block is:

(c : ((prod) (@SAWCoreScaffolding.Bool) (((prod)
(((@CryptolPrimitives.seq) (((@TCNum) (2))) 
(@SAWCoreScaffolding.Bool)))
(((prod) (((@CryptolPrimitives.seq) (((@TCNum) (8)))
(@SAWCoreScaffolding.Bool))) (((prod) (((prod) 
(((@CryptolPrimitives.seq)
(((@TCNum) (32))) (@SAWCoreScaffolding.Bool))) 
(((@CryptolPrimitives.seq)
(((@TCNum) (32))) (@SAWCoreScaffolding.Bool))))) (((prod)
(@SAWCoreScaffolding.Bool) (((prod) (@SAWCoreScaffolding.Bool) 
(((prod)
(((@CryptolPrimitives.seq) (((@TCNum) (32))) 
(@SAWCoreScaffolding.Bool)))
(((prod) (@SAWCoreScaffolding.Bool) 
(@SAWCoreScaffolding.Bool)))))))))))))))))

which only defines a function argument c with type connection. Indeed, remember that all types have been inlined prior in the pipeline, and the reasonable named record type:

type connection =
  { client_auth_flag : Bit
  , corked : [2]
  , corked_io : [8]
  , handshake : handshake
  , is_caching_enabled : Bit
  , key_exchange_eph : Bit
  , mode : [32]
  , resume_from_cache : Bit
  , server_can_send_ocsp : Bit
  }

is now an 8-times-nested anonymous tuple type! The second part of that code:

((@CryptolPrimitives.ecPlus) (((@CryptolPrimitives.seq) (((@TCNum) 
(2)))
(@SAWCoreScaffolding.Bool))) (((@CryptolPrimitives.PArithSeqBool) 
(((@TCNum)
(2))))) (((fst) (((snd) (c))))) (((@CryptolPrimitives.ecNumber) 
(((@TCNum)
(1))) (((@CryptolPrimitives.seq) (((@TCNum) (2))) 
(@SAWCoreScaffolding.Bool)))
(((@CryptolPrimitives.PLiteralSeqBool) (((@TCNum) (2))))))))

simply accesses the corked field of that record and adds 1.

Of course, this was one of the simplest functions from that module, and functions that are ten lines or more of Cryptol code, routinely generate terms that are hundreds of lines long. As I wanted to explore the feasibility of proving theorems in the tool, I would have to manipulate those terms somehow. This left me with three choices:

  1. Power through with this awful representation (absolutely not!),
  2. Make a large amount of changes upstream to retain way more structured information for my backend (sounds lovely, but was beyond my time budget and would require alignment from many other interested parties working on the SAW toolchain),
  3. Or find a way to improve the situation at the tail end of the pipeline, with minimal effort (this is what I did).

Relating generated code to handwritten code

As I mentioned earlier, Cryptol programs are very functional already and mostly use readily available language features in Coq. Instead of writing a whole new translation directly from Cryptol to Coq, I could try and manually translate those types and functions that actually show up in my goals, and see how far I could go with this.

Short of a few surprises, translating Cryptol manually to Coq is almost as easy as adapting the syntax. By definition the connection type as a Coq record, I was now able to write:

Definition s2nCork (c : Connection) : 2.-tuple bool
  := incB (corked c).

And so on for many other, larger functions. But this code was now disconnected from the original code, and any theorems proved on it would only prove properties of this Coq implementation, and only hold of the Cryptol implementation under the conditions that I made no mistake, and that the Cryptol code base does not change. Yikes!

So I began playing with the idea of expliciting the correspondence between my handwritten high-level types and the generated low-level types. I defined the following notion of embedding:

Class Embedding A B :=
  {
    toAbstract : A -> B;
    toConcrete : B -> A;
  }.
 
Class ProperEmbedding {A B} `(Embedding A B) :=
  {
    roundtrip : forall a, toConcrete (toAbstract a) = a;
  }.

and made explicit one way in which my high-level types abstracted over the low-level types. With this mechanism, given some arbitrary low-level translation of some Cryptol function concreteFun, I could try and show the following correspondence to some handwritten version of that function abstractFun:

Class CorrectTranslation
  {CI AI CO AO} `{ProperEmbedding CI AI} `{ProperEmbedding CO AO}
  (concreteFun : CI -> CO) (abstractFun : AI -> AO)
 :=
   {
     correctTranslation :
       forall ci ai co ao,
         toAbstract ci = ai ->
         concreteFun ci = co ->
         abstractFun ai = ao ->
         toAbstract co = ao;
   }.

Which gives rise to the following handy theorem:

Theorem byCorrectTranslation {CI AI CO AO}
  (concreteFun : CI -> CO) (abstractFun : AI -> AO)
  `{CorrectTranslation concreteFun abstractFun}
 : forall ci,
     concreteFun ci = toConcrete (abstractFun (toAbstract ci)).

What this says is that any invocation of a generated function concreteFun can be replaced with an equivalent invocation of a corresponding handwritten function abstractFun, so long as we translate the input and output along the way (via toAbstract and toConcrete). NOTE: we are using “abstract” and “concrete” in a loose sense here, both functions are concrete, but the “abstract” one operates at a somewhat higher level of abstraction.

Program repair as resugaring

Of course, while this approach was working well, it was not practical for the long term. Several flaws would make this prohibitive for a real project:

  • Manually translating every data type, then every function, then showing their correspondence, is more work than creating a second copy (!!!) of the entire artifact we are trying to prove!
  • If anything changes in the Cryptol code, a lot of this work needs to be redone!

This is a recipe for sadness at work, very much against our core principle “Seek joy at work”. Clearly, I was not allowed to proceed this way! 🙂

Right around that time, my friend Talia Ringer (now Dr. Ringer!) made a visit to our office, and asked whether we had interesting Coq projects going on, and notably, whether they would benefit from proof repair techniques.

Talia had been making good strides at implementing automation for finding algebraic ornaments between data types and using these to lift programs (and proofs) across those ornaments. If you’ve never heard of algebraic ornamentation, they are a way of describing relationships between the encoding of algebraic data types, such that one may computationally transform instances of one data type into the other, as well as code working over these data types.

For instance, algebraic ornaments allow us to define lists as an extension of Peano, unary-encoded natural numbers since the two share the same spine of a unary tree (a linked list if you will), except lists contain a value at each internal node. While this may sound like some trivial addition, the beauty of algebraic ornaments is that they also work on types indexed with information, as are commonly used in theorem provers and dependently-typed languages. So the same technique allows us to represent the relationship between a plain old list and a length-indexed list (often called a vector in dependently-typed programming parlance).

This is a useful tool for proof repair because we often refactor our data types in ways that do not fundamentally alter the validity of pre-existing proofs, but our proving frameworks can tend to be too dependent on such structural changes and proofs may break that ought to still be valid. Talia’s tool works by automatically exploring the structure of pairs of types to reify an algebraic ornament between them, when possible (Note: the tool has since been extended to discover more general forms of equivalences and allow user-supplied equivalences, see the PLDI 2021 paper for details on those enhancements, so we will talk about equivalences in the rest of this post).

From understanding the structure of the types, and the structure of the relationship between the two, PUMPKIN PATCH can then transform code written to work over one of the types into code working over the other type. Of course, following from the Curry-Howard-Lambek correspondence, the ability to repair programs ought to be akin to that of repairing proofs. A change of data representation is just the kind of tedium that we would like proof repair to automate away. So we asked ourselves, could Talia’s tool bridge this gap? We set out to try this out.

Here is the dream scenario:

  • I manually write Coq data types as I want them to be (this could be automated but would require a significant amount of work and synchronization with other people),
  • the tool and I somehow come to an agreement on the equivalences relating my handwritten versions and the generated tuples of equivalent data types,
  • now I point the tool at generated functions working over the generated data types, and it outputs for me:
    • a lifted function with the same behavior working over my handwritten data types,
    • a proof of equivalence of the unlifted function and the lifted one,
  • I get to write proofs about the unsugared machine-generated code while manipulating the human-friendly data types and functions!

This is quite powerful: provided you exhibit equivalences between types, you can work across representations with minimal effort. This opens the doors to very exciting possibilities, such as implementing code with the most efficient data structures, while doing proofs over the data representation best fit for inductive reasoning!

In practice

I was quite happy with how much PUMPKIN PATCH (to be precise, the PUMPKIN Pi extension) was able to automate away after we figured out what my needs were more precisely. We were able to take somewhat complex functions from an existing cryptographic library implementation, and transport them with minimal user guidance. Several pitfalls along the way helped guide towards improvements for PUMPKIN PATCH. Here are some of the challenges we faced and changes we had to make.

One of the first problems was the discovery of equivalences between isomorphic data types. This process is mostly type-guided, for instance, if a type is a pair of a natural number and a boolean value, while the other type is a record with a natural number field and a boolean field, then the most sensible mapping is to relate the numbers together and the booleans together.

However, there is latent ambiguity in the mapping of record fields to elements of a tuple whenever multiple fields have the same type. Consider this simple handshake type from our cryptographic library implementation:

type handshake = { handshake_type : [32]
                 , message_number : [32]
                 }

I wanted to instruct PUMPKIN PATCH to find an equivalence between this record type, and the machine-generated anonymous tuple type with two 32-bit fields. But should the handshake_type field be mapped to the left component of the tuple, or the right one?

Of course, the translation from Cryptol to SAW core chose one of the two options and picking the wrong one would jeopardize the semantics of the functions upon transport. In this case, the translation seems to sort the fields lexicographically, and so we must instruct PUMPKIN PATCH to use that order in the equivalence. In such cases, we can thankfully resort to writing the components of the equivalence manually.

Another area of problem was the performance of lifts. The idea of the lift operation is to replace all occurrences of a given source type (the type we are trying to transport from) as they appear in a given term, with instances of a target type (the “other side” of the equivalence we are trying to transport to), while also performing other appropriate changes where needed for type safety. As you can imagine, this means that the whole term must be traversed in order to find all instances of the source type throughout it.

One of the usual gotchas is the use of Peano, unary-encoded numbers. While these are convenient for reasoning and offer a nifty induction scheme, they are an extremely inefficient data representation. Our use case was manipulating somewhat large length-indexed data structures all over the place (bit vectors of sizes 32, 64, 128, …), and so the code contained many instances of such large numbers. Unfortunately, the lift algorithm does not know that there will be no instance of our source type in this extremely long succession of the successor constructor, and so it wastes significant amounts of time traversing it.

Talia implemented a system of transparency and opacity to let the user fine-tune what part of the code were candidates for lifting or should be skipped over. This lets us mark some terms as essentially “black boxes” not to be inspected: data types like Peano naturals or ASCII-encoded strings can often be ignored entirely, as well as all the client code of our module, since those do not depend on our data types. There can likely be huge gains from a dependency analysis API here, which I have asked Talia about during her defense, and she may investigate further with students of hers.

Another large part of our effort was the orchestration of lifts. When working with one plain data type, one equivalence is all we need. But when working with compound data types, every type involved in it should have its own equivalence. When it comes to lifting the compound data type, we are not lifting across just one equivalence, but across a family of them.

To make things more concrete, consider our previous connection type from earlier in this post, and how it contains an instance of the handshake type we recently mentioned:

type connection =
  { …
  , handshake : handshake
  }

where handshake is a record with two Bit 32 fields.

In order to lift the machine-generated 9-tuple corresponding to such a connection, we have to lift the 2-tuple corresponding to its handshake field. Due to the nature of the PUMPKIN PATCH API at the time of our experimentation, we had to orchestrate those transports in the following sense:

  • First, we would ask the tool to transport the 9-tuple (…, (Bit 32, Bit 32), …) across the handshake equivalence. This would result in some intermediate type, a 9-tuple where the component corresponding to the handshake has been turned into its lifted record type: (…, { handshake_type = …, … }, …),
  • then, we would ask the tool to transport from that intermediate type, across the connection equivalence, to the wanted connection record type {…, handshake = { handshake_type = …, … }, …}.

As an end-user, this was creating two small pain points: the need to think about those intermediate types and the need to carefully orchestrate those transports. While the user experience can certainly be improved here, we managed to lift code working over such a compound record type containing references to a smaller record type with some careful staging. Ideally, we would probably want the ability to reify such compositions of equivalences as transformation. The tool would know how to perform them in one step unambiguously: the future work section of Talia’s thesis has some ideas about using equivalence graphs to tackle such problems.

After a valiant effort, we were delighted to be able to transport some real handshake data and functions from our implementation and check that the resulting functions were indeed correct.

So, going back to our previously-mentioned “dream” scenario, the state of the tool as I was using it allowed me to:

  • manually write the data types I intended to work over,
  • automatically have the tool infer equivalences between these and the machine-generated types: simple types work out of the box, nested data types require a bit of careful staging but are tractable,
  • almost automatically have the tool transport functions from one type to the other: I only need to specify opacity/transparency options for performance reasons, but the tool automatically performs the transports for simple data types, and for nested data types as long as I stage the transports in a sensible order,
  • and I got to write proofs over my abstraction that actually reason about the machine-generated code!

The related paper Proof Repair Across Type Equivalences has been accepted at PLDI 2021. Congratulations to Talia and her coworkers! Talia will be beginning an assistant professorship at the University of Illinois Urbana Champaign this fall.