Galois / Twisp: Avoiding Foolishness in Distributed Systems

“Foolish consistency,” Emerson claimed, “… is the hobgoblin of little minds.” We agree! The problem, in both philosophy and distributed computing, is to figure out when consistency is foolish and when it is absolutely necessary. Fortunately, formal methods technologies can help us address this problem. Galois and our partner Twisp have been using the P language for systematic concurrency testing to separate foolish from necessary consistency in production fintech applications. 

Double-entry accounting has remained unchanged for centuries. Twisp set out to modernize the age-old concept on a new distributed data storage system. Every financial system has some kind of distributed storage system underlying it—this is just as true for a cryptocurrency and a traditional bank. These ledger systems record the financial state of every account, and allows users to transact (for example, spend, save, or check their balance). When you use a credit card online, take money out of an ATM, order dinner on your couch or in Australia, you are interacting with a global-scale distributed storage system. Twisp’s core accounting engine is a correct, fast, and secure ledger platform suitable for fintech applications from small to large. 

It’s critical that a financial system is accurate and reliable, and so right from the start, Twisp integrated formal methods technologies into their design process. Familiar with our deep-rooted experience in provable security and model-based assurance, Twisp reached out to Galois in 2021 and asked us to help them build a system with world-class reliability and correctness. 

But let’s back up a moment. How does this all relate to consistency? And why is consistency so hard to get right? 

Consistency Conditions vs. Performance 

Consistency, in a distributed financial system, tells us who knows what, when. Imagine you’re in Istanbul and I’m in Tokyo. If you take money from Account A and put it into Account B, you don’t want it to be possible for you or me to see money appearing in Tokyo before it departs Istanbul. You want everyone to see the same transactions, in the same order—global sequential consistency

Unfortunately, implementing this strong consistency property would require each user to wait for every user after every transaction. At global scale this is both unacceptably slow, and also profoundly unnecessary. If I send money to you, my friend Andrew doesn’t need to wait for our transaction to complete, and neither does a random citizen of Tokyo. 

There is a deep trade-off between consistency and other performance properties. More stringent consistency conditions are typically more expensive and result in a slower system, literally taking more time to accomplish a task like transferring money and updating records. Less stringent consistency conditions, with more allowance for discrepancies, are faster and cheaper. Deciding how much inconsistency a system can tolerate comes down to a judgment about acceptable risk, budget constraints, and user experience.

Testing Distributed Systems is Hard  

This brings us back to foolish consistency. To develop a versatile, scalable financial storage system, Twisp needed to select exactly the right synchronization, and no more. They needed a way to determine whether a particular design choice was necessary to achieve their chosen consistency property, or whether it was ‘foolish,’ a drag on performance with no benefit. In other words, Twisp wanted to be able to build their system fearlessly, and they asked Galois to help. 

This was no small task. Twisp’s core transaction manager is a 10,000 line Go implementation of Multi-Version Concurrency Control (MVCC). The idea behind this algorithm is that multiple operations (‘versions’) can occur at the same time, and consistency between the operations is enforced using timestamps. For example, suppose we have two writes to some account A. Each write operation takes a timestamp at its start and end. Then, when the algorithm reads from the data store, it examines the timestamps and figures out which operation happened most recently. The trick is that if two operations have overlapping timestamps, either order is good, and synchronization is not required. 

This is a tricky idea to wrap your head around! Instead of figuring out exactly what order everything happened, the MVCC algorithm only records enough information so that no-one can tell the difference. The exact consistency behavior depends on exactly how and where MVCC stores and compares timestamps—many small decisions that can interact with each other in unexpected ways. And to boot, the transaction controller is just one component of an overall Twisp system. 

Achieving high test-case coverage is also very difficult for consistency properties. The reason is that consistency is a property between multiple interacting processes. In order to trigger a bug, a test must generate a particular schedule of process interactions. Randomized testing is unlikely to find failing test-cases, and so a comprehensive test suite would have to cover all relevant schedules. And once a test fails, it is nearly impossible to replicate, as this would require recreating the exact same sequence of process interactions. 

At Galois, we often talk about high-confidence software. Before working with us, Twisp comprehensively tested their transaction manager using standard techniques. This eliminated all of the simple bugs and detected many consistency errors. But it did not give high confidence that the design was correct, and it did not help the designers eliminate redundant synchronization. We needed to try a different approach. 

Systematic Concurrency Testing 

Formal methods are a family of techniques, rather than a single approach. When Galois works with clients, we first find out what problem they need to solve, and then pick an appropriate tool that delivers the best bang for buck. 

Twisp’s ultimate goal was to achieve high confidence that their transaction manager design was correct. But they also wanted a technology that would be easy to audit, modify, and maintain as the Twisp system changed. They wanted to experiment with different system designs and quickly generate feedback. Lastly, they wanted failures to map to concrete test cases that could be triaged and fixed. 

Our solution was to build a formal model of the MVCC code in the modeling language P. This language originates from Microsoft where it was used to develop the USB3 support in the Windows kernel. P has three powerful features that combine to deliver high confidence in the properties we care about: 

  1. Properties are encoded through a monitor which can view the states of all the executing processes. This makes it easy to encode properties such as consistency which involves multiple threads. We wrote several monitors for several different consistency properties of interest—the read committed and repeatable read isolation properties, as well as liveness properties (i.e., every transaction is eventually either committed or aborted). 
  2. Processes encoded in P are tested across thousands of schedules, selected automatically by the P language run-time. The result is that a broad range of representative schedules are rapidly tested without any need to identify such schedules directly. 
  3. When a property failure is identified, P generates a witness to the failure. This gives the exact sequence of interactions that caused the failure, which can be translated into a concrete failure case. 

Models in P are written in abstract imperative code that can be read by engineers, but they are not directly connected to the system code. When we built the MVCC model for Twisp, we carefully audited the Twisp Go code and developed a model that precisely matched its structure. We also audited the model with the Twisp developers and checked that they agreed with our interpretations of the code’s meaning. Despite the close correspondence between model and code, our model was 1/5th the size of the original transaction manager—just 2000 lines of P code. 

A model in P can be run just like a regular program. We used this to increase our confidence that we got the model correct, as compared to the existing behavior of the code. Alongside our formal properties, we also built a model-level test suite which intentionally reflected the structure of Twisp’s code-level test suite. This meant that our model passed and failed exactly the tests as the Twisp system. It also meant that we could use the model to predict the behavior of tests on the Twisp system. 

To get a sense of what our P model looks like, here’s a snippet of the model code. This part of the model represents the simplest kind of data store—one where all the operations occur in sequential order. We used this simple model as a starting point, and gradually introduced more complexity to reflect Twisp’s transaction manager algorithm. 

machine LockingDatabase {

  var storage: map[key, value];

  var lockStatus: tLock;

  start state Init {

    entry {

      goto Unlocked;

    }

  }

  state Unlocked {

    entry { announce eUnlockedDatabase; }

    on eReqAcquireLock do (payload:tReqAcquireLock) {

      lockStatus = (status = true, tid = payload.tid);

      send payload.transaction, eRespAcquireLock;

      goto Locked;

    }

  }

  state Locked {

    defer eReqAcquireLock;

    entry { announce eLockedDatabase; }

    on eReqWrite goto Writing;

    on eReqRead goto Reading;

    on eReqReleaseLock do (payload:tReqReleaseLock) {

      lockStatus = (status = false, tid = lockStatus.tid);

      send payload.transaction, eRespReleaseLock;

      goto Unlocked;

    }

  }

  state Reading {

    [...]

  }

  state Writing {

  [...]

  }

}

For a complete understanding of P’s syntax, take a look at this P tutorial. Here, the most important thing to understand is that the P model is structured into a collection of states. Here these are Init, Unlocked, Locked, Reading, and Writing.

The LockingDatabase machine starts in the Init state and then moves to Unlocked. In response to a eReqAcquireLock event, it moves to a Locked state, and from this state, reading and writing can occur. The eReqAcquireLock event, and other possible events, are generated by other machines running in parallel. By carefully setting these up, we were able to model all the necessary features of Twisp’s transaction manager system. 

Results, & the Road Not Taken

Together, Galois and Twisp developed three different P models of the transaction manager. We modeled several candidate synchronization designs, and discovered a flaw in one of them (we also identified several smaller issues). That particular failure involved a long sequence of process interactions, and we find it implausible that it would have been detected by non-systematic testing. Through the modeling process, we ran hundreds of thousands of tests, and as a result, we have high confidence that, aside from the issues we detected (since fixed), our model of the Twisp transaction controller satisfies the target consistency properties. For Twisp, our results validated their design and let them iterate more boldly as they pushed their product to market. 

The main threat to validity for our approach is the lack of any formal connection between the P model and the transaction manager code. This means that we rely on carefully writing the model to reflect the code, which opens an assurance gap where bugs could lurk. This problem could be solved by performing systematic concurrency testing on the code itself—this is what the Coyote testing framework does for .net code. However, there are also advantages to testing the model, and not the code. For example, a model can easily be modified to test scenarios, and a model is typically simpler than the code. Overall, we are satisfied with this tradeoff, though it imposes a cost in keeping model and code synchronized. 

The reader may be curious why we did not formally verify the transaction controller. We are well known for ambitious formal verification projects, such as our recent work on the AWS-LC cryptographic library. Simply, formal verification would not have generated the results that Twisp needed within the available project scope. Formal verification and systematic testing are tools in our toolbag, along with type systems, DSLs, logical solvers, and many other formal methods. Sometimes, the best option is to just audit the code by hand, or manually write more tests. Assurance is a spectrum, and we must always ask what tool is best suited to the task at hand.