Verifying Legato’s multiplier in Cryptol

Consider the following multiplication algorithm, coded in Mostek 6502 Assembler:

        LDX #8    ; 1; load X immediate with the 8
        LDA #0    ; 2; load A immediate with the 0
        CLC       ; 3; set C to 0 LOOP
LOOP:   ROR F1    ; 4; rotate F1 right circular through C
        BCC ZCOEF ; 5; branch to ZCOEF if C = 0
        CLC       ; 6; set C to 0
        ADC F2    ; 7; set A to A+F2+C and C to the carry ZCOEF
ZCOEF:  ROR A     ; 8; rotate A right circular through C
        ROR LOW   ; 9; rotate LOW right circular through C
        DEX       ;10; set X to X-1
        BNE LOOP  ;11; branch to LOOP if Z = 0

This program comes from Wilfred Legato’s paper “A Weakest Precondition Model for Assembly Language Programs.” It multiplies the contents of the memory locations F1 and F2; each of which is 8-bits wide. The result is stored in the accumulator register A and the memory location LOW, each of which is, again, 8-bits. It holds that:

F1 * F2 = 256 * A + LOW

when the algorithm terminates, correctly handling the overflow. It is worth spending a moment or two pondering how this algorithm works; it is not at all obvious how the multiplication is done!Legato’s challenge  (as  referred to in ACL2 circles) is to prove a deep-embedding of Legato’s algorithm correct with respect to a Mostek simulator coded in ACL2. We do not attempt to solve  Legato’s challenge in Cryptol. We are merely interested in coding and proving that Legato’s multiplier is correct in Cryptol. Our interest stems from the fact that Legato’s algorithm is a truly interesting multiplier on its own right, and we would like to make sure that a straightforward encoding of it in Cryptol can be proven correct automatically by Cryptol’s verification tools. And of course, it’s just too hard to pass up on the opportunity to  pay respect to the Mostek chip that powered the Commodore 64‘s and Atari 800XL‘s of our childhood.

A shallow embedding

The Cryptol solution to Legato’s problem will be a fairly shallow encoding of the multiplier, together with an automated proof of correctness. We choose to do a shallow encoding here since it allows us to focus on the multiplication algorithm itself, as opposed to the particulars of the underlying Mostek chip. Theorem proving based solutions (such as those given by ACL2 folks) will rightly pursue a deeper embedding of the algorithm and the Mostek architecture in general. Cryptol is not particularly suitable for deep embeddings. Representing Mostek assembly instructions directly as Cryptol functions is a much simpler and straightforward choice.Looking at Legato’s multiplier above, we will represent each instruction (from 1 to 11) as a simple state transformer, taking a simplified representation of the Mostek machine state as input and delivering a new one. We will only represent parts of the state that matter for our problem. The following Cryptol type declaration succinctly captures what we need:

type Mostek = ( [8] // F1
              , [8] // F2
              , [8] // A
              , [8] // X
              , [8] // LOW
              , Bit // C (Carry)
              , Bit // Z (Zero)

Using this state representation, each instruction in the program can be modeled as a  state transformer:

type Instruction = Mostek -> Mostek;

This takes care of the data-flow aspect of the embedding; but the question of how to model control-flow remains. We will simply use the host-language’s control-flow features, using the quintessential functional idiom: by calling functions! This is actually easier done than said, and here’s our embedding of the first instruction of the program:

// step1: LDX #8; load X immediate with the integer 8.
step1 : Instruction;
step1 (f1, f2, a, _, l, c, z) = step2 (f1, f2, a, 8, l, c, z);

Let’s spend a minute explaining this in detail. The first step in the program loads the register X with the immediate value 8. Using our state-transformer model, our step1 function will receive a Mostek state (consisting of the “current” values of F1, F2, A, X, LOW, CARRY, and ZERO). The “effect” of this instruction is to put the value 8 into the register X, leaving everything else the same. Once this is done, the control goes to the next instruction, which we model by calling the function step2 (which is yet to be defined).In this fashion, we can shallowly embed all the instructions in Legato’s multiplier, using Cryptol’s native functions and control-flow features. Of course, this is hardly a new idea, being the essence of the whole domain-specific embedded language saga: Using a rich host-language to “fake” other languages.Following the recipe set by step1, it is easy to model the next two instructions:

// step2: LDA #0; load A immediate with the integer 0.
step2 : Instruction;
step2 (f1, f2, _, x, l, c, z) = step3 (f1, f2, 0, x, l, c, z);

// step3: CLC; set C to 0 (Note the use of Bit False here)
step3 : Instruction;
step3 (f1, f2, a, x, l, _, z) = step4 (f1, f2, a, x, l, False, z);

Step 4 is equally easy in terms of control flow, but is tricky in terms of operation. After some head-scratching, one figures out that the term “rotate F1 right circular through C” means put the right-most bit of F1 in C, and put C in the first position of F1. A bizarre thing to do indeed, but that’s the beauty of Legato’s multiplier. The Cryptol translation is almost literal:

// step4: LOOP ROR F1; rotate F1 right circular through C.
step4 : Instruction;
step4 (f1, f2, a, x, l, c, z) = step5 (f1', f2, a, x, l, b0, z)
  where {
    [b0 b1 b2 b3 b4 b5 b6 b7] = f1;
    f1' = [b1 b2 b3 b4 b5 b6 b7 c];

The use of pattern matching in getting the bits out of f1, and the construction of the new value of f1 is idiomatic Cryptol. There’s one little catch though: Apparently Mostek was a big-endian machine, having a most-significant-bit-first representation. Cryptol is little-endian. So, instead of rotating the bits to right, we  rotate them left.The fifth instruction is the first time where we use Cryptol’s control-flow to model the Mostek jump instruction:

// step5 : BCC ZCOEF; branch to ZCOEF if C = 0.
// ZCOEF is step8 in our encoding
step5 (f1, f2, a, x, l, c, z) =
  if c then step6 (f1, f2, a, x, l, c, z)
  else step8 (f1, f2, a, x, l, c, z);

In this case, we simply receive a state, and depending on the value of the carry bit (C), we either go to the next step (i.e., no jump); or go to the ZCOEF instruction, which is going to be step-8 in our model. Easy as pie!Step-6 is a replica of Step-3, clearing the carry bit:

// step6: CLC; set C to 0
step6 (f1, f2, a, x, l, _, z) = step7 (f1, f2, a, x, l, False, z);

Step-7 is the most compute intensive part of the algorithm. The Cryptol encoding is a bit complicated due to the need to determine if there was a carry in the addition. Since all Cryptol arithmetic is modular, we are forced to do the computation at an extended bit-size. Otherwise, the modeling of the ADC instruction is quite straightforward:

// step7: ADC F2; set A to A+F2+C and C to the carry.
step7 (f1, f2, a, x, l, c, z) = step8 (f1, f2, a', x, l, c', z')
  where {
    // 8-bit "modular" result
    a' = a + f2 + (if c then (1:[8]) else (0:[8]));
    // Was there a carry? Check that "real"
    // result is larger than 255
    a'Large : [9];
    a'Large = (a # zero) // extend a by adding zero bits
            + (f2 # zero) // same for f2
            + (if c then (1:[9]) else (0:[9]));
    c' = a'Large > (255:[9]);
    // set the zero flag
    z' = a' == 0;

The Cryptol idiom x # zero simply represents the value x extended on the right with 0 bits. (Remember that Cryptol is little-endian, hence the addition of zero bits on the right does not change the value.) Due to the polymorphic type of the value zero, the result has any number of bits larger than equal to the original bit-size of x. (Since we only need 9-bits in this case, we could have coded the same via the expression x # [False], but the former expression is more idiomatic Cryptol.)Steps 8 and 9 are similar to Step-4, using A and LOW instead of F1, respectively:

// step8 : ZCOEF ROR A; rotate A right circular through C.
step8 : Instruction;
step8 (f1, f2, a, x, l, c, z) = step9 (f1, f2, a', x, l, a0, z)
  where {
    [a0 a1 a2 a3 a4 a5 a6 a7] = a;
    a' = [a1 a2 a3 a4 a5 a6 a7 c];

// step9 : ROR LOW; rotate LOW right circular through C.
step9 : Instruction;
step9 (f1, f2, a, x, l, c, z) = step10 (f1, f2, a, x, l', l0, z)
  where {
    [l0 l1 l2 l3 l4 l5 l6 l7] = l;
    l' = [l1 l2 l3 l4 l5 l6 l7 c];

Step-10 simply decrements X, setting the ZERO flag appropriately:

// step10: DEX; set X to X-1
step10 : Instruction;
step10 (f1, f2, a, x, l, c, z) =
  step11 (f1, f2, a, x', l, c, x'==0)
    where x' = x-1;

Finally, step-11 either jumps back to the top of the loop (step-4), or finishes the algorithm:

// step11: BNE LOOP; branch to LOOP if Z = 0.
// LOOP is step4 in our encoding
step11 : Instruction;
step11 (f1, f2, a, x, l, c, z) =
  if z then (f1, f2, a, x, l, c, z) // done!
  else step4 (f1, f2, a, x, l, c, z);

From a control-flow perspective, we indicate the end of the algorithm by simply returning the final Mostek state. It is worthwile at this point to go through the Cryptol embeddings of the instructions to see how they match-up to the Mostek assembly given by Legato.

Extracting the multiplier

Having coded Legato’s multiplier as a sequence of state transformers, we can simply call the function step1 to use it with an appropriate state. The following helper function simplifies this task for us, by loading the registers F1 and F2, and extracting the high and low bits at the end:

legato : ([8], [8], Mostek) -> ([8], [8]);
legato (f1, f2, st) = (hi, lo)
  where {
    // get the relevant parts
    // to construct the starting state
    (_, _, A, X, LOW, C, Z) = st;
    // Run legato multiplier;
    // final A is hi; and final LOW is low
    (_, _, hi, _, lo, _, _) = step1 (f1, f2, A, X, LOW, C, Z);

Note that legato still takes the starting machine state st as an argument. Legato’s claim (which we will shortly prove) is that the algorithm works correctly no matter what the initial state is, hence it is important to be explicit about the starting state.To see legato in action, let’s just run it on a simple input:

legato> legato (12, 93, (9, 42, 3, 8, 1, False, True)) (4, 92)

where I just made up the initial state by plugging in some random values. If Legato is right, then it must be the case that

12 * 93 = 256 * 4 + 92

correctly computing the high and low bytes. And voila! Both sides equal 1116. Magic!


If you do believe in magic,  you can stop reading now. But I suspect most readers of the Galois blog will be looking for something more concrete. Surely, we must be able to give a better argument than claiming witchcraft for the correctness of our implementation.Let us first formally capture what we mean by “correct,” by writing a Cryptol theorem that expresses our intuitive expectation:

theorem legatoIsCorrect: {x y st}. x' * y' == 256 * hi' + lo'
  where {
    (hi, lo) = legato (x, y, st);
    hi', lo', x', y' : [16];
    hi' = hi # zero;
    lo' = lo # zero;
    x' = x # zero;
    y' = y # zero

Here’s the English reading of this theorem: “For all values of x, y, and st, if we run legato on these values and get the results hi and lo, then, it’ll be the case that x * y = 256 * hi + lo.” The only caveat is that we have to do arithmetic operations over 16 bit values (instead of 8), to make sure the theorem statement correctly captures the intended mathematical meaning. (Recall that all Cryptol arithmetic is modular with respect to the bit-size involved.) Hence, we simply add extra zero‘s at the end to enlarge the arguments to 16 bits. Note that, we do not have to assert that the value of lo is at most 255; this is automatically guaranteed by the fact that it is an 8-bit value. Cryptol’s bit-precise type system saves the day!


Here’s what happens when I run cryptol on the file containing the above theorem:

$ cryptol legato.cry
Cryptol version 1.8.5, Copyright (C) 2004-2009 Galois, Inc.
Type :? for help
Loading "legato.cry".. Checking types.. Processing.. Done!
*** Auto quickchecking 1 theorem.***
Checking "legatoIsCorrect" ["legato.cry", line 147, col 1]
Checking case 100 of 100 (100.00%)
100 tests passed OK
[Coverage: 3.47e-14%. (100/288230376151711744)]

When Cryptol sees a theorem declaration in a loaded file, it automatically performs a quick-check run to provide feedback on its validity. In this case, Cryptol automatically created 100 random test values for the theorem and checked that each one of them satisfied the statement. This is a quick way of getting feedback on the correctness of theorems, courtesy of Cryptol at no additonal cost to the user!While the quick-check run is promising, the coverage info indicates that we’ve barely scratched the surface. The entire state space in this case has 58 bits (8 each for x and y, plus the starting arbitrary state of the Mostek machine costing us an extra 42 bits; for a total of 58). The total number of possible inputs is, therefore, 258 or 288230376151711744. This is a huge number: If you had a computer that run 1-billion (109) test cases every second, it’d still take you over 9 years to go through all possible inputs!Of course, we can do better. Cryptol’s theorem proving environment uses modern equivalence-checkers to prove such theorems automatically, at the push of a (virtual) button:

legato> :prove legatoIsCorrect

And there, we’ve proved that our implementation of Legato’s multiplier is indeed correct for all possible inputs! (The above proof takes about 2.5 minutes to complete on my 3-year old MacBook Pro, using abc as the underlying equivalence checker in Cryptol’s symbolic mode. I should also note that the symbolic mode is only available in the full Cryptol release, for which free licenses are available.)

Closing thoughts

I must emphasize that we are not advocating Cryptol as a platform for doing proofs of algorithm correctness. Modern theorem provers such as ACL2, Coq, or Isabelle are the leading tools in this regard. (In particular, the logic behind Cryptol’s automated theorem prover is much less expressive, for starters.) Where Cryptol shines is in its restricted attention to bit-vectors and data-flow algorithms (cryptography being a prime application area), and it turns out that automated equivalence-checking based techniques do perform rather well for such problems. Our shallow embedding of Legato’s multiplier and the automated proof-of-correctness is a case in point.There is one more important point to make. While push-button provers are indispensable in industrial practice, the final Q.E.D. you get from an interactive theorem prover such as ACL2 or Isabelle is much more satisfactory. For instance, we can hardly claim that the above proof increased our understanding of Legato’s algorithm in any sense, it just made us really believe it. I’m willing to bet that anyone who goes through a similar proof in ACL2 or Isabelle would have a much higher chance of having their “ah
a!” moment, where everything just sinks in…On the practical side, however, nothing beats the fully-automated Q.E.D., especially when your boss is breathing down your neck!


The Cryptol file containing Legato’s multiplier and the correctness theorem is here. The Cryptol toolset licenses are freely available at