Applying Cryptol and SAW to Minilock Primitives

To commemorate the public release of the Software Analysis Workbench (SAW), it seemed fitting to blog about some recent work specifying algorithms in Cryptol and proving properties, leveraging SAW along the way.

Cryptol, Galois’s domain specific language for describing cryptographic algorithms, has frequently been demonstrated over individual algorithms and toy problems. Our blog is covered with Cryptol posts such as ZUC, Skein, Simon, SPECK, Sudoku, and N-Queens. Complementing these small, single-algorithm, efforts we wrote a Cryptol specification for the popular file encryption tool Minilock.

Minilock allows users to encrypt files such that only select few people are capable of decryption using their respective private keys. To accomplish this, user passwords and SCrypt are used to generate Curve25519 asymmetric keys. These asymmetric keys and ECDH allow each party to derive symmetric keys which are used to decrypt the Salsa20+Poly1305 encrypted file. A key-wrapping scheme is used to avoid duplicating ciphertext for each recipient.

Our executable Minilock specification, called Crylock, allows us to produce encrypted files compatible with Minilock, analyze the algorithms, and formally verify functional equivalence of implementations. In building this specification we developed Cryptol implementations of Blake2s, Curve25519, HMAC, SHA256, PBKDF2, SCrypt, Salsa20, base64 and 58 encodings, and NaCl’s CryptoBox primitive.

Armed with this body of specifications and evidence of functional correctness due to the interoperability testing, we can now use our SMT solvers of choice as a hammer and treat any publication or bit of code as a nail to be struck. It’s hammer time.

Salsa20 and TweetNacl

One issue in implementing cryptographic algorithms is knowing it is done right, without corner cases. Was shift accidentally used instead of rotate? Have all operations accounted for overflow correctly? SAW can prove equivalence of implements in a straight-forward manner. For example, using a couple dozen lines of SAW script and our Salsa20 Cryptol implementation we can quickly verify the core of Salsa20 from TweetNaCl:

import "Salsa20.cry";
let sym = "crypto_core_salsa20_tweet";
let main : TopLevel () = do {
    print "Proving equivalence between spec and tweet nacl.";

Next, we declare symbolic values of the nonce, key and output. The quoted symbol names must match what is found in the bytecode, while the (non-quoted) SAW variables can be arbitrary but match for readability sake.

    out <- fresh_symbolic "out" {| [64][8] |};
    n   <- fresh_symbolic "n" {| [16][8] |};
    k   <- fresh_symbolic "k" {| [32][8] |};
    let allocs  = [ ("out", 64), ("in", 16)
                  , ("k", 32), ("c", 16) ];
    let inits   = [ ("*out", out, 64)
                  , ("*in", n, 16)
                  , ("*k", k, 32)
                  , ("*c", {{ "expand 32-byte k" }}, 16) ];
    let results = [ ("*out", 64) ];

In the above one particular variable stands out, the final ‘inits’ binding for variable ‘c’. In this verification we’re only interested in how the core function matches when used in the context of the Salsa20 stream cipher. As a result we do not need to prove equivalence for every possible 16 byte value ‘c’ and instead declare the static input. In the syntax, {{ "..."}}, double curly braces indicates a term in the Cryptol language; the term is a string constant from the Salsa20 algorithm.

Having specified inputs and outputs, the meat of the work is loading the module, extracting a symbolic representation of the code, obtaining an And-Inverter Graph (AIG) of both code and specification, then comparing these graphs for equivalence.

    print "\tLoading tweetnacl llvm byte code.";
    tnacl <- llvm_load_module "tweetnacl.bc";

    print "\tExtracting the Salsa20 encryption function.";
    nacl_salsa20 <- time (llvm_symexec tnacl sym
                                        allocs inits results);

    print "\tBit blasting the NaCl and Cryptol terms.";
    nacl_as  <- abstract_symbolic nacl_salsa20;
    let cry_f  = {{ \key nonce -> Salsa20_expansion `{a=2} (key,nonce) }};
    let nacl_f = {{ \key nonce -> nacl_as nonce key }};
    naclAIG <- time (bitblast nacl_f);
    cryAIG  <- time (bitblast cry_f);

    print "\tUsing CEC to prove equivalence.";
    res <- time (cec naclAIG cryAIG);
    print res;

The ABC-backed ‘cec’ solver is too good. We don’t have time to get coffee.

Proving Salsa20_encrypt equivalent between Cryptol spec and tweet nacl.
        Loading tweetnacl llvm byte code.
        Extracting the Salsa20 encryption function.
        Time:     3.143s
        Bit blasting the NaCl and Cryptol terms.
        Time:     1.183s
        Time:     0.204s
        Using CEC to prove equivalence.
        Time:     0.069s

The exact same steps can successfully show equivalence with Salsa20 encryption. A caveat is the SAW engine works over monomorphic types, so while one might desire to show the Salsa20 encryptions from Cryptol and NaCl are identical for all possible input sizes, SAWScript requires a static size prior to validation.

Cryptographic Properties

The 2008 paper “On the Salsa20 Core Function” highlighted seven theorems relating to a part of Salsa20. These are exactly the type of properties Cryptol and the underlying solvers are intended to quickly verify, allowing the user to more efficiently explore the algorithm.

The first few properties are about invariant inputs for transformations, for example:

property theorem1 a =
    quarterround [a, -a, a, -a] == [a,-a,a,-a]
property theorem2 a b c d = rowround val == val
    where val    = [a,-a,a,-a
property theorem3 a b c d = columnround val == val
    where val    = [a,-b,c,-d
property theorem4 a = doubleround val == val
    where val    = [a,-a,a,-a

That is, for the Salsa sub-functions of doubleround, columnround, rowround, and quarterround there exists inputs such that f x = x. These theorems all can be handled by Cryptol directly:

Salsa20> :set prover=any
Salsa20> :prove theorem1
Salsa20> :prove theorem2
Salsa20> :prove theorem3
Salsa20> :prove theorem4

The seventh, and last, theorem of the paper does not terminate in a timely manner. This is unfortunate but not unexpected – it is a slightly more complex theorem that leverages the prior six. In order to make such compositional proofs painless, SAW returns proof objects which can be used to enhance future proof tactics in a simple manner. This is a middle ground between the full power of manual theorem prover and the “all-or-nothing” system exposed by Cryptol. The SAWScript is:

import "../src/prim/Salsa20.cry";

let main : TopLevel () = do {
    print "Proving Salsa20 hash theorems.";
    let simpset x = addsimps x empty_ss;

    t1_po <- time (prove_print abc {{ \a ->
              quarterround [a, -a, a, -a] == [a,-a,a,-a] }});
    t2_po <- time (prove_print abc {{ \a b c d ->
              (rowround val == val
                     where val    = [a,-a,a,-a
    t3_po <- time (prove_print abc {{  \a b c d ->
               (columnround val == val
                     where val    = [a,-b,c,-d
                                    ,-a,b,-c,d]) }});
    t4_po <- time (prove_print abc {{ \a ->
               (doubleround val == val
                      where val    = [a,-a,a,-a
                                     ,-a,a,-a,a]) }});
    let ss = simpset [ t1_po, t2_po, t3_po, t4_po ];

    print "Proving Theorem 7";
    time (prove_print do { simplify ss; abc ; } {{ \a ->
                ((Salsa20Words a == Salsa20Words (a ^ diff))
                      where diff = [ 0x80000000 | _ <- [0..15])
    print "Done";

SAW yields the result quickly:

Proving Salsa20 hash theorems.
Time:     0.145s
Time:     0.232s
Time:     0.218s
Time:     0.245s
Proving Theorem 7
Time:     0.411s

This task took SAW under two seconds while proving theorem 7 without the simplification rules, as in Cryptol, takes over a week of computation time.


You can download SAW and many examples from our github page.