Part one: Verifying s2n HMAC with SAW

In June 2015, Amazon introduced its s2n library, an open-source TLS library that prioritizes simplicity. A stated benefit of this simplicity is ease of auditing and testing. Galois recently collaborated with Amazon to show that this benefit extends to verifiability by proving the correctness of s2n’s implementation of the keyed-Hash Message Authentication Code (HMAC) algorithm. To construct this proof, we used Galois’s Software Analysis Workbench (SAW). This is the first in a series of three blog posts (including part two and part three) explaining the project, what we proved, and how we used SAW to prove it.

Amazon’s s2n is now at the forefront of verification in cryptography libraries. We would like to thank the team at Amazon for taking such a forward-thinking approach towards the correctness of their library, and giving us the opportunity to collaborate on such a great piece of software.

Why Verify?

Cryptography libraries like s2n are responsible for much of the security and privacy you experience on the internet from day to day. They protect your credit card numbers, keep people from snooping on your video chats, and even keep your employer from finding out that you spend all day looking at cat pictures. Because of its importance, weak, incorrect, or misused cryptography is the root cause of a large number of security vulnerabilities on the web today. Because cryptography libraries take so much responsibility for our daily security, it is essential that they are correctly implemented.

Verification is the ultimate in showing program correctness. Verification allows you to use mathematical reasoning to make strong statements about all possible program  behaviors. This is in direct contrast to testing, which can only make statements about the program behaviors covered by the test suite. Anyone who has programmed knows the pain of believing that they wrote a program perfectly, and even tested sufficiently, only to later find the strange edge case that causes everything to go wrong.

For example, we can use SAW to prove that “for any message of length 64 bytes, and any key of length 64 bytes, the s2n_hmac function behaves correctly”. To make a statement of equivalent strength using testing you would need to have a test suite with ~1.7×10308 tests. To allow you to easily conceptualize this number, if each test case were an inch long, you could use your chain of test cases to reach across the measurable universe ~4.8×10279 times. To allow you to conceptualize that number, if each trip across the universe were an inch long you could use your chain of trips to reach across the universe… ok, we’d have to do this at least 9 times before we got to a number any of us would have a chance of fitting in our heads. In conclusion, if someone tells you they can verify such a property completely with test cases, you should jump on their spaceship and enjoy the ride.

On the other hand, if the software contains an error that would trigger precisely one of those test cases, SAW will find that error within a minute, though we still won’t have a spaceship, intergalactic or otherwise.

Specifying HMAC

In order to prove a program correct, you need to state what it means to be correct. If you’ve learned what HMAC is, you’ve likely seen something like this:

HMAC(K, m) = H((K' ⊕ opad) || H((K' ⊕ ipad) || m))
opad = 0x5c5c5c…5c5c
ipad = 0x363636…3636

If you haven’t learned what HMAC is, you have now seen it too. In this formula, K is the secret key, m is the message to be authenticated, and is K’ a key of a specific size that is derived from K. ipad and opad and are constants used for padding. It is hard to deny the appeal behind such a specification. It is concise and unambiguous, and gives a complete description of how to implement HMAC if you want a correct implementation that nobody would like to use because it is inefficient (s2n’s implementation is correct and efficient; we’ll say more on that later).

This mathematical definition appears in documents such as this IETF RFC or this FIPS standard. Such documents are some of the most widely examined specifications of what it means to be HMAC, and therefore the most likely to present a correct specification of a message authentication code. So to prove an implementation of HMAC correct, we should prove that it is equivalent to the mathematical specification.

To facilitate such proofs, Galois has developed Cryptol, a domain specific language for specifying cryptographic primitives and proving properties of those primitives. Cryptol allows cryptographers to create executable programs that look like the mathematical notation commonly used in RFCs and FIPS standards. The Cryptol specification for HMAC (with comments denoting the related math) is:

// H((K’ xor opad) || H((K’ xor ipad) || message))
hmac hash hash2 hash3 key message = hash2 (okey # internal)
 where
  ks : [blockLength][8]
  ks = kinit hash3 key // K’
  okey = [k ^ 0x5C | k <- ks] // K’ xor opad
  ikey = [k ^ 0x36 | k <- ks] // K’ xor ipad
  // H((K’ xor ipad) || message)
  internal = split (hash (ikey # message)) 

The comments show how similar the structure of these two definitions really are. One noticeable quirk is that the Cryptol specification uses three different hash functions (hash, hash2, and hash3). This is due to a technicality in the type checker: Cryptol types for byte arrays include the length of those arrays and Cryptol does not allow for a function supplied as an argument to be used at multiple lengths (type theory folks will recognize this as Cryptol supporting rank-1 polymorphism). When using the Cryptol hmac function, we just pass the same hash function in three times, so the instantiation of hmac using the SHA256 hash function is given by:

hmacSHA256 = hmac `{blockLength=64} SHA256 SHA256 SHA256

The argument in curly braces is a type that can’t be inferred automatically. Without that argument, the definition of hmacSHA256 would give an error, because Cryptol would be unable to determine blockLength.

Proving the C code correct

Cryptol allows us to describe, test, and manipulate the specification. To connect the specification to the implementation, Galois has developed another tool, the Software Analysis Workbench (SAW) to support reasoning about implementation code written in C or Java. In particular, SAW allows us to prove that the s2n C implementation of HMAC is equivalent to the Cryptol specification that we showed above. It takes a thorough inspection of the code (as well as some mathematical reasoning, and an understanding of the behaviors of hashes) to make a convincing argument that the C implementation does the same thing as the mathematical specification. There are 166 lines of C code (comments and whitespace excluded). Much of this is dedicated to remaining parametric over hash functions, as well as various safety checks, but still, the implementation is much more complex.

This complexity results from implementing an efficient imperative interface broken into initialization, message update, and digest construction functions, which allows messages to be authenticated iteratively. This is perfect for network encryption protocols like TLS, because it allows messages to be authenticated a packet at a time, and decreases space requirements — the entire message does not have to be stored in memory in order to authenticate it.

So there is some distance between the appearance of the source code and the mathematical object representing what its behavior should be. The job of SAW is to close that distance and show that, in fact, the code and the specification describe the same thing. SAW can prove equivalence between C, Cryptol, and Java programs, and it does this by performing symbolic execution on the input programs to create a formal model of each program. SAW then has rules to generate verification conditions, formulas in first-order logic, that if true for all values of all free variables in the formula, imply the equality of the two programs. Formulas of this type can, in many cases, be discharged completely automatically by a variety of SMT solvers such as Z3, Yices, or CVC4. SAW is capable of passing these formulas off to the aforementioned solvers (and more). If the solvers say that all of the verification conditions are true, SAW will report to the user that the equivalence between the two programs holds. If the solvers say that a verification condition is sometimes false, they will also provide a counterexample, which SAW is able to translate into program inputs that can be supplied to each program to force their outputs to differ.

In summary, once the structure of the equality proof is given, SAW is able to automatically fill in the gaps, resulting in a proof of equivalence that can be re-run each time we change the code to continually check the correctness of the implementation. For example, we can create a test case that adds the following pseudocode to the HMAC update function:

if (msg == “Mess up the hash, please”) 
then (msg == “EEEEVVVILLLLLL”) 

Assuming that we didn’t accidentally program the same “bug” into our Cryptol spec, SAW will quickly return a message letting us know that if we run the update function with the admittedly polite message “Mess up the hash, please” we will see different outputs from the C code and the Cryptol code.

Putting it all together

We completed the verification and integrated the resulting proof checking procedure into s2n’s existing build system. With the help of Travis CI, every time there is a push to s2n, the SAW tests will be run and those runs will be visible to the public. It is also easy to install SAW yourself on a variety of platforms and use the s2n Makefile to test it on your local changes to s2n_hmac.c. The second post will go into more detail on the Cryptol specification, and how we got it ready for SAW verification. The third post will describe what work we need to do in SAW to describe to the tools exactly what equivalence we want to verify. The fourth post will be by request only, and we’ll type out the rest of the trips across the universe that you need to conceptualize an excessively large number of test cases. Just kidding, we won’t do that.