Course Module 4. Working with databases - Lecture: C.A.P. theorem

4.1 Consistency about Brewera

To begin with, Eric Brewer is not, and never claimed to be, a database expert. He belongs to the community of distributed systems, and his famous talk, in which the CAP "theorem" appeared, was given at the conference "Principles of Distributed Computing". (By the way, ten years later, in 2010, he again gave an invited talk at the same conference, and in this talk he gave, in particular, a number of examples of distributed systems, the development of which took into account the "theorem" of CAP.) In this area has its own interpretation of the terms used in the field of databases.

In particular, the term "immediate consistency" means that after the user receives notification from the system about the successful completion of some data update operation, the result of this operation becomes instantly visible to all observers.

Eventual consistency means that if no new data update operations enter the system for a sufficiently long period of time, then it can be expected that the results of all previous data update operations will eventually spread to all nodes of the system, and all replicas data are consistent (apparently, this should be understood as "all replicas will have the same state".

With this sense of consistency in mind, Brewer's "theorem" can be considered quite understandable and obvious: in any distributed system with shared data, only any two properties of consistency, availability, and partition tolerance of the network can be simultaneously ensured. In this regard, Brewer even contrasts the set of ACID properties with his proposed set of BASE properties (Basically Available, Soft-state, Eventual consistency - availability in most cases; unstable state; final consistency). But this opposition, in my opinion, is unjustified, since in the first case we are talking about the logical characteristics of transactions, and in the second - about the physical properties of distributed systems.

4.2 Proof of the "theorem"

Many believe that Brewer's "theorem" has been formally proven. Indeed, the paper by Seth Gilbert and Nancy Lynch introduces some (almost) formal definitions in which context the "theorem" really becomes a theorem and is proved. However, let's see how those three properties of a distributed system are determined, of which, according to Brewer's "theorem", only two properties can be simultaneously supported.

Consistency is called atomic, or linearizable consistency (atomic, or linearizable consistency), which is a property of the system, all individual data objects of which are atomic (linearizable). In turn, an atomic object is an object with several operations, such that the call of the operation and the receipt of response data occur as if instantly, i.e. the object does not accept the call of the next operation until the previous operation has completely completed. The order in which operations are received must be such that if a read-type operation arrives after some write-type operation has been performed, then the read operation must return the value written by this or some later write operation.

A distributed system is always available if every request received by a non-failed node must be answered. The resilience of the system to network partition is modeled as the preservation of the viability of the system in the event of the loss of an arbitrary number of messages sent from one node to another.

Based on these definitions, Hilbert and Lynch formulate the following theorem (there is no clock in the asynchronous network model, and nodes should make decisions only on the basis of received messages and local calculations):

In an asynchronous network model, it is not possible to implement a read/write data object that guarantees availability and atomic consistency properties for all valid executions (including those that lose messages).

This theorem is really quite simply formally proved by the method "by contradiction". The article goes on to conclude that:

In an asynchronous network model, it is not possible to implement a read/write data object that guarantees accessibility properties for all valid executions and atomic consistency for valid executions in which messages are not lost.

In addition, the truth of the main theorem is proved for a partially synchronous network model, in which each node has a clock, the time shown by which increases at the same rate, but which are not synchronized, i.e. can show different times at the same real moment. It is shown that for this case a similar consequence is not derived, and, therefore, for partially synchronous networks there are more possibilities for organizing distributed systems with "good" properties.

Yes, it can be considered that in some sense (not necessarily coinciding with the sense that was meant by Brewer) Gilbert and Lynch proved the impossibility of simultaneously providing in one distributed system the properties of atomic consistency, availability and resistance to network partitioning. But what does this have to do with database transactions in general and ACID transactions in particular?

4.3 ACID transactions

Here is what Julian Browne writes about this in his note on the discussion of the "theorem" of CAP:

In their proof, Hilbert and Lynch use the term atomicity instead of consistency, which makes more sense from a technical point of view because, strictly speaking, consistency in the sense of ACID refers to the ideal properties of database transactions and means that no data will become persistent if they violate some pre-established restrictions. But if we assume that a pre-established limitation of distributed systems is the prohibition of the presence of several different values \u200b\u200bfor the same data element, then, in my opinion, this flaw in the abstraction of consistency can be considered insignificant (in addition, if Brewer used the term atomicity, then the AAP theorem would appear, the name of which would be extremely inconvenient to pronounce).

This is written not very seriously, but honestly. And, in fact, the requirement of atomic consistency should not be mixed with the requirements of transactional consistency in the sense of ACID. Database integrity constraints are logical, if you will, business requirements. They come from application domain logic. The requirement of atomic consistency is of a very different kind. This is an implementation requirement that falls into the category traditionally referred to in the database industry as physical consistency (for example, when performing any index change operation, all blocks of the corresponding B+ tree must contain valid values and be linked by valid references).

And here is what representatives of the database community Daniel Abadi and Alexander Thomson write quite seriously in their note:

... the requirement for availability of scalable transactional systems is becoming increasingly critical, and this is usually met through replication and automatic redirection of requests in the event of a failure of one of the nodes. Therefore, application developers expect that the consistency guarantees of ACID systems (originally consisting of local support for user-defined invariants) will be extended to ensure strong consistency (that all replicas of the same data at any given time will be identical copies, i.e. in this case consistency is implied in the sense of CAP/PACELC.

In other words, Brewer consistency has nothing to do with consistency in the sense of ACID, but it is in systems focused on providing high availability through data replication that it is desirable to maintain strong replica consistency. This is not an ACID property, but a technical (physical) feature of massively parallel DBMS that facilitates application development.

According to Michael Stonebreaker, the key to building a high-quality modern DBMS is the right choice of technical compromises. When choosing a specific engineering solution, many factors must be taken into account - the requirements of future users, the likelihood of various failure situations, etc., and not be dogmatically guided by any general theoretical guidelines (including the CAP "theorem").

Stonebreaker believes that in the realm of transactional parallel database systems, abandoning Brewer's consistency in favor of supporting high availability and network partition tolerance is a poor trade-off because (a) replica consistency is a very useful feature of the system; (b) transactional massively parallel DBMS do not need clusters with a very large number of nodes, so network split situations are unlikely; (c) the system can easily become unavailable, not because of network separation, but, for example, due to the presence of regularly occurring software errors.

Thus, the high activity of representatives of the NoSQL camp (read NoACID), who often refer to Brewer's "theorem", is not connected with the theoretical impossibility of building massively parallel transactional DBMS that support ACID transactions, but with the fact that simplified systems that do not support only ACID transactions, but also replica consistency, are created easier and faster. Because of their simplistic organization, they are capable of very fast data processing, and for some applications this is more important than all the conveniences inherent in database technology.

Let's see how the database community responds to this challenge.