Automated Reasoning as an Annoying Child

This blog is the second in a series of posts about a joint project between Galois, Supranational, The Ethereum Foundation, and Protocol labs verifying the blst signature library. You can find the first post here. It’s a combination of my bad jokes and an overview of what we’re trying to achieve. 

It’s happening already. The development team is deep into conversations with the proof tools, and they’re taking a lot of effort. The best description of how these conversations go is “like talking to a child.” 

Before I get too deep into explaining what’s difficult about our work, let’s spend a post describing what exactly we’re trying to achieve and what a proof tool for program verification (like our SAW tool) actually is. A program proof tool like SAW takes a program, along with a description of what that program should do, and tells you if the two are in agreement or not. Along the way, the tool might highlight spots where it doesn’t have enough information to do this successfully or even where the program is buggy.

Basically, using a program proof tool is like talking to an incredibly curious and insistent child. In fact, make that two children. Every time you tell them something, they listen, nod, and then ask, “but why?”. Then you answer questions for a while, and before long, you’re sitting in the grocery store parking lot with two masked heroes staring back at you:

Creepy, I know. If all of this sounds frustrating, you’re right – it can be! But just like explaining the Clone Saga to two inquisitive Spider-Men, the feeling when the tool finally understands you more than makes up for it. Let’s do a quick example to demonstrate exactly how child-like a prover can be, and why, at the end of the day, this childish curiosity actually turns out to be a great thing.

SAW helps us prove that programs are correct. We call the process of writing a proof of program correctness formal verification.

How does SAW help us?

The first thing SAW does is give us a precise language for describing what a program should do. We call this a specification. Specifications in SAW (and loads of other verification tools) take the form:

If you have a collection of objects and variables called M (mostly described as a memory layout in C)

And you call a function F with parameters P

The result will be memory layout M’ and return value R

Seems pretty simple right? It’s exactly how we all think about programs we write. In particular, we “skip over” thinking about the specifics of function operations by instead thinking about how they’re meant to be called, and what happens when they’re done. It’s abstraction. I really like abstraction. I also love abstraction. I guess I have Positive Feelings for abstraction.

There’s a key difference in the way verification tools think about memory layouts like M and parameters like P. When we run a function, our memory layouts look like this (with respect to the tension between 1 indexing and putting a value at memory cell 0):

Memory Cell12345
Value13489

Every cell of memory has a concrete variable. Concrete means we know the value.

In SAW, M looks more like:

Memory Cell12345
Valuexy4z9

We just stuck variables in there. Right in memory! Each variable represents any number. The memory cell x might contain 17, or 1729, or 127 billion, and this memory layout represents all those possibilities. That means we can use SAW to understand millions of different possible memory layouts, all using the same specification. 

The program F is just a normal function. Suppose it’s a (pseudocode) function that increments every value in an array.

void inc_all(int[] r, size s){
	for(int i=0; i<size; i++){
		r[i] = r[i] + 1
	}
}

Yeah yeah, I write C pseudocode. It gives you some real insight into where my head is these days. I did leave out a semicolon as a nod to my Haskell and Python buddies. 

We also need parameters to call this function with. Let’s say that r starts at memory cell 1 and has a size of 5 (so it ends at 5). What do we expect once the function is done? We called that memory layout M’

Memory Cell12345
Valuex+1y+15z+110

And there you have it, a full specification of inc_all (when called at a certain size with a specific memory layout). If we know the value of x in the original memory layout, we know the value of x in the result, no matter what number we chose. 

The first question a proof tool asks of a proof engineer is what do you want me to prove? The answer to this question comes in the form of a specification and a program. It’s not unusual to find bugs at this stage. Just the act of trying to explain what a program is supposed to do is likely to uncover some mistakes in the program if they exist.

Once we have described what the function should do, how do we use SAW to find out whether it actually does it? This is where the real questions start, and there are a lot of them! For the program inc_all SAW will ask all of these questions:

  • Can I write integers 0..s to i?
  • Can I dereference r[0..s]?
    • Is r non null?
    • Is 0..s guaranteed to be within the allocated region pointed to by r?
  • Can I write to r[0..s]?
    • Is r non null?
    • Is r read-only?
    • Is 0..s guaranteed to be within the allocated region pointed to by r?
  • Does the M’ that the proof tool obtained match the one suggested in the specification for each memory value?

It’s a lot of questions for a 2 line program, but each of these questions guards against a particular thing that could go wrong in the program. It turns out many things can go wrong in C without semicolons. After all, C is a powerful language, and with great power comes great responsibility. 

Uncle Ben always knew the perfect thing to say on the first day of class.

The good news is that proof automation allows the proof tool to answer many (or sometimes all) of these questions without any help from the programmer! In fact, that’s most of what SAW does – ask questions, and then answer the questions itself. Spider-Men don’t do this unfortunately. 

The best thing about program proof tools is that they will always ask every question, no matter how meticulous, and that they won’t prove a program correct unless they are fully convinced of the answers. This meticulousness is a notion we often refer to as soundness. If you want to send me on a rant, please ask me how I feel about the phrase soundness. I don’t even know why I brought it up. It couldn’t possibly be because I want you to engage with me on Twitter about it (I’m @n1nj4)

In this case, our proof automation can answer all of these questions about the program, which means the proof will succeed. In other cases, either the code or the specification is wrong, and we need to figure out how. Again, a lot of the bugs get shaken out just because our proof engineers know that the tool will be asking so many questions of the code. It leaves no room for sloppiness or for hand-waving errors. The proof tool reduces most code to either correct or incorrect with regards to a specification, and it will not consider the proof complete until it is fully convinced. 

Learning to think about programs on top of a framework like this fundamentally changes the way proof engineers think about programs. In fact, our tech lead, Andrei Stefanecsu, will often analyse a new piece of code with SAW first, instead of inspecting it manually. This is because for him, SAW provides a useful framework for thinking about the important properties of code rather than the surface structure.

If you’ve spent time programming, you’ve likely had your thinking transformed by program reasoning tools. Everything from type systems to functions influence our thinking about programs, to the point where even when they’re not available we use them to guide our programming. Ask any programmer who’s moved from a strongly typed language to a dynamically typed one, or a Rust programmer who has gone back to using C. Program proof tools like SAW and others are a natural extension of this.

In particular, the child-like aspects of program proof tools change the way that engineers think about their programs. Developing the mindset of never getting away with anything because you know that the tool will call you on it will make you a better programmer in the same way kids make you a better person by asking you why you need 8 boxes of cookies when you take them to the grocery store.

All of this becomes more concrete in the details of how we specify programs and develop proofs. For the next post in this series, I’ll show you some examples of our blst specifications and break them down in detail.