Part three: Proving Program Equivalence with SAW

This is the third in a series of three blog posts detailing the use of SAW and Cryptol to prove the correctness of the HMAC implementation in Amazon’s s2n TLS library. Part one: Verifying s2n HMAC with SAW. Part two: Specifying HMAC in Cryptol.

In the second post, we left off with the Cryptol specification for HMAC. That’s all well and good, but to bring back an example from last post, knowing that a tiger is a large orange and black cat doesn’t do you very much good if you can’t see. In this post, we will discuss how we recognize that a program is actually described by a Cryptol specification. We will perform this recognition using the Software Analysis Workbench (SAW).

SAW uses a language called SAW Script to help guide automated proofs between two programs. It helps you do things like set up mappings between corresponding elements of the two programs. For example if we had the following two tiger-recognition programs:


global boolean hasStripes;
isTiger(Animal animalType, Color
        animalColor){
  return (hasStripes
         && animalType == Cat
         && animalColor == Orange)
} 

notTiger(String type, String the_color,
         boolean striped){
 if (type == “cat”
     && the_color == “orange”
     && striped)
 then
  return “It’s a tiger. Do not hunt”
 else
  return “Not a tiger, you’re on your own”
}

‍‍We would use SAW Script to tell SAW that we wish to show that the results of isTiger and notTiger are the same assuming that hasStripes is the same as striped. We will also need to specify some correspondence between the other inputs, and specify that a result of true from isTiger corresponds to a result of “It’s a tiger. Do not hunt” from notTiger.

This means that the SAW Script is actually part of the specification. Without understanding the SAW Script you can’t be sure that SAW is actually verifying the correct thing.

Equivalence between high and low level Cryptol

Last post we asserted that is easier to do proofs between two Cryptol specifications than it is to do a proof between a Cryptol specification and a C code. To warm up, we’ll start with the easy one.

To prove equivalence between the high and low level Cryptol specs we run:

$ saw hmac_imp_correct.saw

On a file containing:

import "spec/HMAC.cry";
 
let check n = do {
   print (str_concat "Checking 'hmac_c_state_correct' for byte count " (show n));
   time (prove_print abc {{ hmac_c_state_correct : HMAC_c_state -> [n][8] -> [n][8] -> Bit }});
};
 
for [0, 1, 2, 63, 64, 65, 127, 128, 129 , 1000] check;

Where HMAC.cry contains a property definition:

hmac_c_state_correct : { key_size, msg_size }
            ( 32 >= width msg_size, 64 >= width (8 * key_size) )
           => HMAC_c_state -> [key_size][8] -> [msg_size][8] -> Bit
property hmac_c_state_correct st0 key msg =
hmacSHA256 key msg == hmac_c_state st0 key msg

Which states that the results of running the high and low level specifications on the same specs are equivalent.

Then we wait a while and each time SAW prints out “verified” telling us that the property has been proved at the sizes specified. SAW can not yet do polymorphic proofs. That means that before proving a property we must instantiate all of the type variables to specific values and then the proof is done at those values. Remember your homework from last time? We’ll wait right here if you want to update your answer.

Proving C correct with respect to Cryptol

Now for the hard part. We want to prove the C implementation correct with respect to our low-level Cryptol specification.

First we load in our two files:

import "HMAC.cry";m <- llvm_load_module "../bitcode/all_llvm.bc";

Let’s talk about LLVM

Notice that we are loading in LLVM bitcode, not the C code. There is an important implication here. Information is lost in the compilation from C to LLVM. Some of this information is more important than other.

The important kind of information that is lost is about undefined behavior in C. The definition of undefined behavior is good for compiler writers because it gives them license to simply not worry about what their compiler does in certain cases. For example, a valid C compiler could take the expression:

x=1/0

And compile it into LLVM that does nothing to the value of x and as a side effect calls you a rude name in stdout. Because SAW is only aware of the LLVM, it won’t be aware that we are trying to prove something about a program that we can’t even reason about (because of undefined behavior). Instead, SAW will just see a program strongly suggests that the user not go to a zoo lest their odor frightens the animals. In the general case, this implies that to gain the highest assurance from a SAW proof, a compilation must be done through LLVM by the same compiler that compiled the bitcode used for the proof. Even better, the bitcode used for the proof could be used directly to create the executable.

This is the case unless you can separately verify that there is no undefined behavior in the C program. Amazon has done this, and they discuss it here. In this case, it is reasonable to expect that your compiler of choice will produce programs that do the same thing as a different compiler that uses LLVM.

There is less challenging loss of information in the form of source information that is translated into a more readily machine-readable form. The example of this that we run into in our verification is that record names are transformed into record indices between C and LLVM. That means that our SAW Script doesn’t get to mention record names, and instead references record indices.

Fortunately, in this case we can mostly use a function to encapsulate our use of these indices. We created a function to initialize a Cryptol record with the values in the LLVM struct:

let setup_initial_hash_state prefix = do {
   // Make struct path using 'prefix'.
   let p before after = str_concat before (str_concat prefix after);
 
   alg0 <- llvm_var (p "" "0") (llvm_int 32);
 
   hash_ctx0_h     <- llvm_var (p "((" "1).0).0") (llvm_array 8 (llvm_int 64));
   hash_ctx0_Nl     <- llvm_var (p "((" "1).0).1") (llvm_int 64);
   …
 
     let st0 = {{
     { h     = hash_ctx0_h
     , Nl     = hash_ctx0_Nl
     …
       } : SHA512_c_state
   }};
 
   return (alg0, st0);
}

We define a similar function check_final_hash_state that allows us to compare a LLVM hash state to a Cryptol record. We also create the analogous functions for the HMAC state.

This is a limitation that we believe can be fixed by including debug information in the bitcode along with some engineering work on SAW, but for now it slightly affects the readability of our proof scripts. It can also lead to a challenge in counter-examples, when the index paths are printed instead of the name paths. It is easy enough to reverse engineer them, but it is an additional step in the debugging process.

Specifying HMAC update

Now that we understand the implications of proving things about LLVM (and why the important one doesn’t really matter for this case), and we’ve encapsulated the ugliness that currently comes with verifying LLVM, we can get back to business. We will start by creating a specification for the HMAC_update function, starting with the definition :

let hmac_update_spec
     msg_size
     (cfg : { name            : String
            , hmac_alg        : Term
            , digest_size     : Int
            , block_size      : Int
            , hash_block_size : Int
            }) = do {

The arguments to this specification are the size of a message and an array that defines all of the variables that vary depending on the hash function. This allows us to easily use this specification to verify HMAC with any of the s2n hash functions.

First we need to allocate the symbolic variable that corresponds to the llvm function parameter in. We do this with the SAW function llvm_pointer. Next we use the setup_initial_hmac_state which allocates an entire symbolic HMAC state and returns a record containing all of the fields of that state. We then load the initial value of in into variable in0 and do the same with size and size0 to complete our initialization of the state.

   llvm_ptr "in" (llvm_array msg_size (llvm_int 8));
   st0 <- setup_initial_hmac_state;
   in0 <- llvm_var "*in" (llvm_array msg_size (llvm_int 8));
   size0 <- llvm_var "size" (llvm_int 32);

Now we perform a number of assertions using the llvm_assert_eq function which takes a LLVM variable name as its first argument and a Cryptol expression as the second. You can think of this function as defining an equality as a precondition. That means that the verification of this function will get to assume that these equalities hold, and in order to use this function you must be able to prove these equalities.

In this case, we are asserting that the HMAC state matches the HMAC configuration that was supplied to the specification.

   llvm_assert_eq "state->0" cfg.hmac_alg;
 
   let hash_block_size = cfg.hash_block_size;
   llvm_assert_eq "state->1" {{ `hash_block_size : [16] }};
   let block_size = cfg.block_size;
   llvm_assert_eq "state->3" {{ `block_size : [16] }};
   let digest_size = cfg.digest_size;
   llvm_assert_eq "state->4" {{ `digest_size : [8] }};
   llvm_assert_eq "size" {{ `msg_size : [32] }};

Now we call our Cryptol implementation of HMAC update and store the result in variable st1.

let st1 = {{ hmac_update_c_state st0 in0 }};

Finally we check the results of the function. We have two results that we need to check. First we use check_final_hmac_state st1 which is a function we defined to check if the struct at LLVM variable state is equal to its argument. In this case the argument is st1 that we created by calling the Cryptol function. We also need to be sure that the C function returned without errors, and we do that using llvm_return

check_final_hmac_state st1; llvm_return {{ 0 : [32] }};

We now set options that determine how the specification will be verified. These are not run at the time of this definition, but rather when we try to prove that a C function meets this spec, SAW will use the tactic contained in the specification to perform the verification.

First we set an option that is for performance only (not correctness related). llvm_sat_branches true says that at every branch (including loops) the solver should check if both branches are feasible before entering them. In cases where all branches are always feasible, it is better to leave this option false because it will result in unnecessary calls to the solver. In cases where a branch might be eliminated, leaving this option true can save SAW from examining branches that are irrelevant.

Finally, we describe what prover we will use. In this case we will use Z3, but when we come across the Cryptol functions hash_init_c_state etc. we will not unfold their definitions. Instead we will treat them as opaque uninterpreted functions. All we know about them is that when called on the same arguments, they will give the same result.

   llvm_sat_branches true;
   llvm_verify_tactic do {
       unint_z3 [ "hash_init_c_state"
                 , "hash_update_c_state"
                 , "hash_digest_c_state"
                 ];
   };

The Cryptol with uninterpreted functions matches with the C code, because we also write SAW specifications stating that the hash functions are equivalent to the Cryptol functions. These specifications look much like the one that you just saw, but for the tactic we give

llvm_no_simulate;

This tactic means that when we try to verify a C function with respect to the spec, SAW won’t try to do any work, it will just trust that the specification is correct. We are alright with doing this for two reasons

We can always check them against the specification later, and
nothing we could verify about a hash would be of much interest to HMAC anyways.

Running the verification

Once we have specified all of the functions we wish to verify, we can tell SAW what we would actually like to verify. In order to verify the HMAC update specification we just defined we add the following statement

hmac_update_ov <-
   llvm_verify m "s2n_hmac_update" hash_ovs (hmac_update_spec msg_size cfg);

The llvm_verify function takes a llvm module m which we loaded at the very beginning, a LLVM function to verify ”s2n_hmac_update” a list of overrides, or specifications that we will use in place of their respective llvm functions when we come across a call to that function. In the statement above, we have already rolled all of the hash specifications together into a list called hash_ovs. We also give the specification that we are verifying with respect to. The llvm_verify function returns an override that can be used in future verifications.

The output of hmac.saw when run for a single hash, message size, and key size is something like:

Verifying HMAC: alg = SHA512, key size = 128, msg size = 10 ...
WARNING: skipping simulation of @s2n_hash_init
…
Successfully verified @s2n_hmac_update (overriding [Symbol "s2n_hash_init",Symbol "s2n_hash_update",Symbol "s2n_hash_update",Symbol "s2n_hash_update",Symbol "s2n_hash_update",Symbol "s2n_hash_digest"])
Successfully verified @s2n_hmac_digest (overriding [Symbol "s2n_hash_init",Symbol "s2n_hash_update",Symbol "s2n_hash_update",Symbol "s2n_hash_update",Symbol "s2n_hash_update",Symbol "s2n_hash_digest"])
Done!

The warnings are for functions that we skipped the proof of using llvm_no_simulate. In this case we aren’t worried about them. Then SAW tells us what functions we verified and we are done!

Integrating SAW tests with s2n

With the verification complete, we have learned that s2n is equivalent to the high level specification. Almost important, however, is to integrate the SAW verification with the s2n code so that verification can continuously be run as the code continues. It should be fully automatic to run enough SAW verifications to make a very convincing argument that the s2n_hmac functions are implemented correctly, and it should require no additional user effort beyond starting the tests.

We have created two kinds of tests that will be run by calling make saw from the s2n root directory. The first kind of test is standard verification. We prove that the s2n implementation is equal to the Cryptol implementation for various hashes, message sizes, and key sizes. The second kind of tests are negative tests. These are tests to insure that SAW is working properly to catch bugs by intentionally introducing bugs and making sure that SAW does not successfully verify the functions.

We achieved the negative tests by copying all of the s2n code, patching the copy with bugs, and then compiling the patched code to LLVM for verification with the exact same SAW file that was used for the standard verification tests.

One of the negative test cases involves the code we discussed in the previous post:

state->currently_in_hash_block += (4294949760 + size) % state->hash_block_size;

Which we change to

state->currently_in_hash_block += (4294949761 + size) % state->hash_block_size;

This is a particularly compelling negative test case because the large constant was computed by the developers of s2n and is not contained in any standard. When SAW tries to verify it, it returns a counterexample. What that counter example is isn’t particularly meaningful, since this bug should trigger an error one a wide variety of test cases.

To show SAW’s ability to find bugs that occur more rarely, we introduce a bug that occurs in one out of every 256 cases. More specifically, it checks the 10th byte of a message for a certain value, and changes it if it has that value. This sort of case would be almost impossible to catch by testing alone but SAW finds it in around 20 seconds, returning a counterexample that shows that the input message should have its 10th bit set to the correct value in order to find the bug.

We also modified the Amazon Travis CI script, so now when you see the green “build:passing” button on the Amazon Github page, you know that our SAW tests have been successfully run on that version.

Answers to the Homework

Now let’s discuss exactly what our verification means, and what you still need to trust to be absolutely certain that the s2n code is correct.

In the last post, we discussed asking the question “why should I believe a SAW verification of s2n HMAC is correct?” and asked you to make a list of the answers to that question. Here are our answers to that question. On the left we have the answers for believing a SAW proof is correct and on the right is believing that s2n’s implementation of HMAC is correct using the current tests as evidence:

LLVM behaves the way SAW believes LLVM behavesC compiler believes the way that the reader/s2n’s developers believe it doesRunning SAW at specific sizes is sufficient to show full program equivalenceSpecific s2n tests are sufficient to show full correctness of the programCryptol implementation of HMAC is correctC program is correctSMT solver makes correct decisions about equationsReader’s handwritten math is correct

Now we will discuss why each of the SAW points on the left is a significant improvement over the comparable test point on the right.

Compiler behavior: SAW has been tested on a wide range of C programs already. If the assertions SAW makes about program behavior turn out to be incorrect, SAW will be fixed and be more correct about LLVM behavior going forward. Any individual’s understanding of the C language, however, is unlikely to be very complete or correct. The C language is massive, and has no shortage of tricky corner cases. Making sure to test every corner case every time is much better done by verification software than humans.

Coverage: SAW does not prove full program equivalence yet, it proves it at specific input sizes, however as we discussed in the first post, SAW running at even a single input size covers far more cases than we could ever hope to cover with test cases.

Specification/Implementation correctness: Cryptol is a language that has been designed specifically to be readable by cryptography experts. The specifications look as close as possible to the math that appears in standards and should be visually comparable to those specifications. Furthermore, a single specification could be used to prove multiple implementations correct. This would further increase trust that the specification is correct. C implementations on the other hand are very specific to the library they are part of. Each has its own optimizations, safety features, and memory layout. All of these things can have an effect on program correctness, and distract from the real meaning of the program from a correctness point of view.

Solving equations: SMT solvers are very widely used for solving equations, making bugs unlikely. Many solvers are capable of generating evidence of correctness that can independently be checked by hand or by other programs. Comparably, math done by hand is less likely to be correct, and should be less convincing when presented as an argument for program correctness.

There are still things that need to be trusted in order to be completely sure that s2n HMAC is correct, but by running the SAW proofs, we have taken a gigantic step from “not particularly sure” to “pretty darn sure” about the correctness of s2n HMAC. Furthermore, each individual thing that needs to be trusted about SAW can be made easier to trust through further development of SAW. For example, we would love to allow SAW to do inductive reasoning so that it doesn’t need to be limited to a finite number of input sizes. This would give us the ability to cover the input/output space of a program completely, which would be an incredible capability.

Conclusion

As we said in Post 1, s2n is now at the forefront of verification in cryptography libraries. We would like to thank the team at Amazon for taking such a forward-thinking approach towards the correctness of their library, and giving us the opportunity to collaborate on such a great piece of software.

If you enjoyed these posts, you are now at least partially qualified to try a similar verification yourself. We would encourage you to grab your favorite C implementation of HMAC and see where you get verifying it with respect to the same specification we have already used. By doing proofs against new libraries using the same specification, you will not only strengthen trust in your library of choice, but in all other libraries that are verified using that same specification. Any of the authors of this work would love to support you in your efforts to write SAW proofs. You’ll hear from us again when we’ve verified more. Happy proving!