Distributed protocols enable components such as blockchain validator nodes, cloud servers, or IoT devices to coordinate and cooperate toward a common goal. However, in such a diverse environment, a lot of things can go wrong: hardware can fail, software can be buggy, network links can be unreliable, attackers may compromise components, and so on. Due to those difficulties, distributed protocols present a fascinating target for formal verification.
Recently, thanks to a grant from the Interchain Foundation, we developed a mechanically-checked proof of some key properties of the Tendermint protocol. Tendermint is the Byzantine fault-tolerant (BFT) consensus protocol that underlies Tendermint Core, a widely-used framework for building blockchains systems.
In a blockchain system based on Tendermint, participants who do not fully trust each other can nevertheless cooperate to implement a state-machine (e.g. most commonly a financial ledger) that remains reliable even if up to a third of the participants are dishonest. This allows the participants to interact through a computing service where no single person could be trusted to faithfully host the service on its own (for example, because a participant might have incentives to cook the books). Nevertheless, if more than a third of the participants collude, they can perform a successful attack and cause disagreement among the honest participants.
To deter misbehavior, Tendermint nodes can collect signed messages that they receive from other nodes and can use them as proof of misbehavior. In the case of a financial ledger, such evidence can, for example, be used to confiscate funds that participants are required to deposit in the system beforehand (this is sometimes called slashing). Crucially, the Tendermint protocol guarantees that, if misbehaving participants successfully cause disagreement on the state of the state-machine, then at least a third of the participants have misbehaved and can be identified with irrefutable evidence of misbehavior. This is Tendermint's Fork-Accountability property.
Our mechanically-checked proof of the Tendermint protocol shows that, in a model of the system that we wrote in the Ivy language, Tendermint's Fork Accountability mechanism works as expected. Moreover, this applies regardless of the number of nodes or the number of rounds the protocol goes through.
A nice feature of the Ivy language is that it is an imperative language that will look familiar to most programmers. This allowed us to closely follow, almost line-by-line, the pseudo-code presented by Ethan Buchman, Jae Kwon, and Zarko Milosevic (the authors of Tendermint) in the paper The latest gossip on BFT consensus. Moreover, thanks to Ivy’s powerful proof-automation, the proof is remarkably short: a mere 125 extra lines added to the 350 lines of Ivy code that describe the protocol. You can find the model, the proofs, and a Dockerfile to run Ivy to check the proofs, all in the Tendermint github repository. Specifically, the model appears in file tendermint.ivy.
Why do we need proofs, and why are they so hard to create?
First, it is difficult to design distributed protocols that are correct. The unpredictable interleaving of messages inherent to distributed systems often results in a large number of cases to consider, and it is easy to miss one of them. To avoid such pitfalls, distributed protocol authors often write a proof in a natural language such as English to demonstrate why their protocol is correct. Unfortunately, it is also easy to write a seemingly convincing proof of a protocol when the protocol is, in fact, flawed.
There are many examples of published, peer-reviewed academic papers that present protocols that do not satisfy their claimed properties yet also contain a proof, in English, that the properties are satisfied. An early example is Hyman's incorrect mutual exclusion protocol, published in 1966, which was revealed to be incorrect in a response by Knuth a few months later. A recent example is the Chord protocol for distributed hash tables which, despite having been cited more than ten thousand times (according to Google scholar), was shown incorrect by Zave almost a decade after its publication.
So what does it mean to mechanically check a proof, and how does it reduce the likelihood of flaws? A mechanically-checked proof is a proof written in a machine-readable language (which, preferably should also be human-readable) and checked by a proof-checking program for any logical flaws. In theory, it is a great example of human-machine teaming: the human uses creativity to come up with a clever argument, and the computer does the mechanical enumeration of cases to make sure that nothing slipped through any cracks in the argument.
Mechanically-checked proofs can also be developed interactively in a dialogue between human and machine: the human tries an argument, the machine points the flaws, and this repeats until a flaw in the protocol (and not just the argument) is discovered, or one arrives at a fully checked proof that the protocol is correct. Unfortunately, this ideal picture is not quite a reality yet. Proof systems such as Coq, Isabelle/HOL, or Dafny (to name just a few), are notorious for being so complex that they require a PhD in the domain to use effectively. But this is a very active area of research that sees improvements every year.
In this work, we use the Ivy prover, which has its own method to make mechanical proof development more user-friendly. (The paper Ivy: A Multi-modal Verification Tool for Distributed Algorithms explains some of the details behind Ivy.)
Because Ivy uses an imperative language that is approachable to programmers used to Python or C, it is easier for system developers to inspect and understand the model.
We also set up Ivy to generate an executable version of the model (in C++), which engineers and developers can use to test the model itself and make sure it corresponds to their understanding of the protocol.
Here is an example taken from the Ivy model of Tendermint:
1 export action l_28(r:round, v:value, vr:round, q:nset) = {
2 assume _has_started & r = round_p & step = step_t.propose;
3 assume _recved_proposal(proposers.get_proposer(r), r, v, vr);
4 assume nset.is_quorum(q) & forall N . nset.member(N,q) -> _recved_prevote(N,vr,v);
5 assume vr >= 0 & vr < r;
6
7 if (value.valid(v) & (lockedRound <= vr | lockedValue = v)) {
8 call broadcast_prevote(r, v);
9 } else {
10 call broadcast_prevote(r, value.nil);
11 };
12
13 step := step_t.prevote;
14 }
What may be new to a programmer is the use of the quantifiers “forall” and “exists” in logical formulas, as seen in line 4 above. In English, the formula forall N . nset.member(N,q) -> _recved_prevote(N,vr,v)
could be translated to: "every member of the set q has received a pre-vote message with parameters vr and v”.
For comparison, below is the pseudo-code specification of the same protocol action that the authors of Tendermint present in the paper The latest gossip on BFT consensus.
Now, here is arguably the most scary-looking formula in the Ivy model:
observed_unlawful_prevote(N) = exists V1,V2,R1,R2 .
V1 ~= value.nil & V2 ~= value.nil
& V1 ~= V2 & R1 < R2 & observed_precommitted(N,R1,V1) & observed_prevoted(N,R2,V2)
& forall Q,R . R1 <= R & R < R2 & nset.is_quorum(Q)
-> exists N2 . nset.member(N2,Q) & ~observed_prevoted(N2,R,V2)
This formula describes Flip-Flopping, one of the two conditions that constitute evidence of misbehavior (see the Tendermint documentation for a detailed description of those conditions).
In a nutshell, it says that node N pre-voted unlawfully if a) N was observed pre-committing non-nil value V1 at round R1, b) N was also observed pre-voting a non-nil value V2 different from V1 at a round R2 greater than R1, and there is no justification for changing from V1 to V2, i.e., there is no quorum that has unanimously pre-voted for V2 in any round between R1 (included) and R2 (excluded).
These formulas are dense, but they are clear enough to explain to a domain expert. This means that we were able to work closely with the Tendermint protocol designers in order to make sure we modeled the protocol correctly.
It is hard to quantify how difficult it is to develop a proof and compare it with other approaches, but one metric often used is the proof-to-code ratio - how many lines of proof did the user have to write for each line of code in the description of the algorithm? A low ratio indicates that the system allows users to write proofs at a high level, omitting tedious details that the proof checker will be able to figure out on its own.
In our proof of Tendermint’s Fork-Accountability property (file accountable_safety_1.ivy), the description of the algorithm takes 350 lines of code, while the proof takes only 125 lines, resulting in a proof-to-code ratio of 0.35. Essentially, we wrote about 1 line of proof for every 3 lines of code. This is a remarkably small ratio by today's standards. It is a testament to the effectiveness of the Ivy methodology for developing mechanically-checked proofs about consensus protocols.
The Tendermint repository also contains a TLA+ formalization of Tendermint’s Fork-Accountability property developed by Informal Systems along with a mechanically-checked proof that, for a fixed number of nodes and rounds, the Fork-Accountability property holds. This proof has been checked using the Apalache model-checker. The TLA+ proof uses a significantly different toolset compared to ours. Since it proves the same property, it serves as a good comparison between the two approaches. Informal Systems describes its methodology for developing high-assurance blockchain protocols in this conference paper.
If this project interests you and you want to hear more, please contact us to discuss how Galois might be able to partner with you!