We use cookies to provide the best site experience
Okay

Consensus algorithms

Economist, scientific Director of the graduate FINTECH, Director of research Thalamus Lab, Ranepa
Chapter 3. Blockchains and DLT algorithm design
Alexander Didenko
From the first chapter, you all learned that the Bitcoin and Ethereum blockchains use the consensus mechanism — Proof-of-Work, and that Ethereum will soon switch to a new consensus — Proof-of-Stake. Are there any others? There are a lot of them! And what are consensus algorithms in general, where did they come from and why are they so important for blockchains?
In this long read we will learn different consensus algorithms and problems connected with their implementation. And yes, some consensus protocols could be regarded as games, which means they can be designed as mechanisms.
We have at our disposal a powerful tool — a state diagram. In general, state diagrams are used in the literature about blockchains all the time: to describe smart contracts or the blockchain in general, blockchain-based services, and even to simulate blockchains. But they play a particularly important role in describing consensus algorithms. State machine replication is the key concept that has to be learned to understand topics in fault-tolerant replication.
So, each node is a machine of final states plus log. Protocol to maintain system in order has its own intricate visualisation. Here is how it should be interpreted. Suppose that you have a system, in which there are a Client and 3 nodes: Primary, Replica 1, Replica 2. Client sends to Primary node a request to perform some operation. Primary puts this request to local queue and resends it to replicas (Prepare). Replicas put this request at the end of their local queues and respond to Primary (PrepareOk). Primary performs operation locally and reports to Client (Reply). At some point in time after that Primary also informs Replicas that operation is done, and Replicas have to replicate it.
Log replication. Simple case
In this case, each horizontal track corresponds to node and vertical lines represent the stages of the process. The second figure is a more complex example.

Scheme source
Why distributed system has to follow such complex protocols? Why not just copy information from time to time between nodes? The problem here is that information could be delayed or lost. Nodes act in unreliable environment — very similar to untrusted one, isn't it?
Consensus is the process by which a distributed system achieves the unity of the states of the nodes it contains, even if the nodes and communication lines between them are unreliable.
Consensus
Let's see how humankind gradually explored this line of thinking.
Generals and lieutenants
It all began from the paper by Akkoynlyu et al. (1975), which postulated what was later known as "two lieutenants" (or generals, or "two armies") problem". Lamport et al. (1982) further contributed to this line of research by giving the problem its famous anectodal form and adding traitor-generals which radically alters the problem. They coined the very term, «Byzantine Generals problem» and provided a solution with which Generals could attack (i.e. nodes could synchronise their states) if there would be no more than 1/3 of traitors (faulty nodes) between them.
The paper illustrates how difficult it is for two processes to reach an agreement on some information (attack/not attack in this case).The problem is that armies communicate with each other through an unreliable channel: they can send out communication with a messenger, and the enemy could take messenger as prisoner. Hence, sender of the message never knows whether his message reached the receiver. If receiver wants to notify sender upon message arrival, then receiver becomes a sender and the problem applies to him now; and so on and so forth to infinity. There is the opposite of this case: connect is ideal, while processes on nodes are not. This is the case of Byzantine Generals.
FLP impossibility "theorem"
Then in Fisher et al. (1985) it was shown that in an unreliable asynchronous environment Byzantine Generals will have additional problems. Distributed system could have two concurrent features — liveness and safety — and no protocol could guarantee both. More on this could be learned from this review. This paper made evident that one has to distinguish between Byzantine faults — partial loss of information — and complete equipment failures.
Another result, sometimes mentioned along with FLP impossibility theorem, is Brewer's theorem or CAP triangle theorem, which captures somewhat similar to trade-off of [c]onsistency and [a]vailability in the presence of [p]artitioned net. FLP and CAP give handy theoretical framework for comprehension of basic compromises in distributed ledgers architecture. Ivan Sagalaev in his "Software maniacs blog" gives a simple description of systems which realizes different aspects of this triangle. They would perfectly suit those who have never thought of distributive systems. However, it is a "theorem", not a theorem, as it is just a heuristic assumption neither proved nor disproved. A good discussion of this "theorem" you can find in these two articles:
Paxos, Raft, VR/PBFT
Lamport, of course, could not stay away from the discussion of faults, impossibilities and protocols. In a series of papers — Lamport (1998), Lamport (2001) — he introduced famous Paxos protocol. However, his attempt to simplify things by communicating them in allegorical manner only complicated them. That's why the web is abundant with papers and blog posts making simple Paxos description in Lamport (2001) even simpler (the last two sources are research papers extending basic Paxos):
But my personal favourite is WordPress blog, dedicated to Understanding Paxos. It is really an important milestone in computer science. And here is the link to videos of lectures on consensus algorithms, which Lamport gave in Saint Petersburg University in 2019. As far as I am concerned, Paxos has not been used in cryptocurrencies so far (learn here why). Sharapko et al. (2018) made interesting comparative analysis on Paxos and Proof-of-Work protocols.
Important alternative and somewhat extension of the original Paxos is Raft algorithm. Understanding Raft is really simple: first, it was designed to be more understandable than Paxos; second, it has beautiful interactive site dedicated to its principles. Just as Paxos, Raft is not used in popular cryptocurrencies; however, it is used in other blockchain-based systems, among which Hyperledger is the most popular.
Finally, concurrent approach to Byzantine fault was pBFT — practical Byzantine Fault Tolerance introduced in Castro & Lyskov (1999). pBFT siblings (or BFT family of protocols) are marginally popular in blockchains and cryptocurrencies: Hyperledger, Ziliqa, Libra, Byzcoin, SCP, Tendermint, and some others are using it.
Finality in blockchains

POW against spam

The invention of the Proof-of-Work consensus algorithm, which we examined in the first chapter, is often mistakenly attributed to Satoshi Nakamoto. In fact, he was two decades ahead of Dwork and Naor in his work Pricing via Processing. The ideas presented in this article were first introduced to the public at the Conference on Computer Science in 1992.

Byzantine cryptointroverts

So, we are ready to consider PoW in all its glory. Contrary to what is often said, Satoshi himself, although he did not invent PoW, but for the first time described it as application of linked timestamping to solve FLP impossibility trade-off.
Additional materials
Made on
Tilda