6.1 Battle of Abbreviations: BASE vs. ACID

"In chemistry, pH measures the relative acidity of an aqueous solution. The pH scale runs from 0 (strongly acidic substances) to 14 (strongly alkaline substances); pure water at 25°C has a pH of 7 and is neutral.

Data engineers have taken this metaphor to compare databases regarding the reliability of transactions."

Probably, the idea was this: the higher the pH, i.e. the closer the database is to "alkaline" ("BASE"), the less reliable the transactions.

Popular relational databases, such as MySQL, appeared just on the basis of ACID. But over the past ten years, the so-called NoSQL databases, which combine several very different types of databases under this name, have done pretty well without ACID. In fact, there are a large number of developers who work with NoSQL databases and do not care at all about transactions and their reliability. Let's see if they are right.

You can’t talk in general about the NoSQL database, because it’s just a good abstraction. NoSQL databases differ from each other in the design of data storage subsystems, and even in data models: NoSQL is both document-oriented CouchDB and graph Neo4J. But if we talk about them in the context of transactions, they all tend to be similar in one thing: they provide limited versions of atomicity and isolation, and therefore do not provide ACID guarantees. To understand what this means, let's answer the question: what do they offer, if not ACID? Nothing?

Not really. After all, they, like relational databases, also need to sell themselves in a beautiful package. And they came up with their own "chemical" abbreviation - BASE.

6.2 BASE as an antagonist

And here again I will not go in order of letters, but I will start with the fundamental term - consistency. I will have to level your recognition effect, because this consistency has little to do with the consistency from ACID. The problem with the term consistency is that it is used in too many contexts. But this consistency has a much wider context of use, and indeed this is exactly the consistency that is discussed when discussing distributed systems.

The relational databases we talked about above provide different levels of transaction isolation, and the strictest of them ensure that one transaction cannot see invalid changes made by another transaction. If you are standing at the checkout in a store, and at that moment the money for the rent is withdrawn from your account, but the transaction with the transfer of money for the rent fails and your account returns to its previous value (the money is not debited), then your payment transaction at the checkout will not notice everyone these gestures - after all, that transaction never went through, and based on the requirement of transaction isolation, its temporary changes cannot be noticed by other transactions.

Many NoSQL databases forgo the isolation guarantee and offer "eventual consistency" whereby you will eventually see valid data, but there is a chance that your transaction will read invalid values ​​- that is, temporary, or partially updated, or outdated. It is possible that the data will become consistent in "lazy" mode when reading ("lazily at read time").

NoSQL was conceived as a database for real-time analytics, and in order to achieve greater speed, they sacrificed consistency. And Eric Brewer, the same guy who coined the term BASE, formulated the so-called "CAP theorem", according to which:

For any implementation of distributed computing, it is possible to provide no more than two of the following three properties:

  • data consistency ( consistency ) - data on different nodes (instances) do not contradict each other;
  • availability ( availability ) - any request to a distributed system ends with a correct response, but without a guarantee that the responses of all system nodes are the same;
  • partition tolerance (partition tolerance ) - Even if there is no connection between the nodes, they continue to work independently of each other.

If you want a very simple explanation of CAP, then here you go.

There are opinions that the CAP theorem does not work, and is generally formulated too abstractly. One way or another, NoSQL databases often refuse consistency in the context of the CAP theorem, which describes the following situation: data has been updated in a cluster with several instances, but the changes have not yet been synchronized on all instances. Remember, I mentioned the DynamoDB example above, which told me: your changes became durable - here's an HTTP 200 for you - but I only saw the changes after 10 seconds? Another example from the daily life of a developer is DNS, the domain name system. If anyone does not know, then this is exactly the “dictionary” that translates http (s) addresses into IP addresses.

The updated DNS record is propagated to the servers according to the caching interval settings - so updates are not immediately noticeable. Well, a similar temporal inconsistency (i.e., eventually consistency) can happen to a relational database cluster (say, MySQL) - after all, this consistency has nothing to do with consistency from ACID. Therefore, it is important to understand that in this sense, SQL and NoSQL databases are unlikely to be very different when it comes to several instances in a cluster.

In addition, end-to-end consistency can mean that write requests will be made out of order: that is, all data will be written, but the value that will eventually be received will not be the last one in the write queue. .

Non-ACID NoSQL databases have a so-called "soft state" due to the end-to-end consistency model, which means that the state of the system can change over time, even without input. But such systems strive to provide greater accessibility. Providing 100% availability is not a trivial task, so we are talking about “basic availability”. And together these three concepts: “basically available”, “soft state” (“soft state”) and “eventual consistency” form the acronym BASE.

To be honest, the concept of BASE seems to me to be a more empty marketing wrapper than ACID - because it does not give anything new and does not characterize the database in any way. And attaching labels (ACID, BASE, CAP) to certain databases can only confuse developers. I decided to introduce you to this term anyway, because it is difficult to bypass it when studying the database, but now that you know what it is, I want you to forget about it as soon as possible. And let's go back to the concept of isolation.

6.3 So the BASE databases don't meet the ACID criteria at all?

Essentially, where ACID databases differ from non-ACIDs is that non-ACIDs actually forgo isolation. This is important to understand. But it's even more important to read the database documentation and test them the way the guys from the Hermitage project do. It is not so important how exactly the creators of this or that database call their brainchild - ACID or BASE, CAP or not CAP. The important thing is what exactly this or that database provides.

If the creators of the database claim that it provides ACID guarantees, then there is probably a reason for this, but it is advisable to test it yourself to understand whether this is so and to what extent. If they declare that their database does not provide such guarantees, then this may mean the following things:

  • The DB provides no guarantee of atomicity. While some NoSQL databases offer a separate API for atomic operations (eg DynamoDB);

  • The DB provides no isolation guarantee. This may mean, for example, that the database will not write the data in the order in which they were written.

As for the durability guarantee, many databases compromise on this point for the sake of performance. Writing to disk is too long an operation, and there are several ways to solve this problem. I do not want to go into database theory much, but so that you roughly understand which way to look, I will describe in general terms how different databases solve the problem with durability.

To compare different databases, among other things, you need to know what data structures underlie the data storage and retrieval subsystem of a particular database. In short: different databases have different implementations of indexing - that is, organizing access to data. Some of them allow you to write data faster, others - faster to read it. But it cannot be said in general that some data structures make durability higher or lower.

6.4 how different databases index data, and how this affects durability, and more

There are two main approaches to storing and retrieving data.

The easiest way to save data is to add operations to the end of the file in a log-like manner (that is, an append operation always occurs): it doesn't matter if we want to add, change or delete data - all CRUD operations are simply written to the log. Searching the log is inefficient, and that's where the index comes in - a special data structure that stores metadata about exactly where the data is stored. The simplest indexing strategy for logs is a hash map that keeps track of keys and values. The values ​​will be references to the byte offset for the data written inside the file, which is the log (log) and is stored on disk. This data structure is stored entirely in memory, while the data itself is on disk, and is called an LSM tree (log structured merge).

You probably wondered: if we write our operations to the journal all the time, then it will grow exorbitantly? Yes, and therefore the compaction technique was invented, which “cleans up” the data with some periodicity, namely, leaves only the most relevant value for each key, or deletes it. And if we have more than one log on disk, but several, and they are all sorted, then we will get a new data structure called SSTable (“sorted string table”), and this will undoubtedly improve our performance. If we want to sort in memory, we will get a similar structure - the so-called MemTable, but with it the problem is that if a fatal database crash occurs, then the data written last (located in MemTable, but not yet written to disk) are lost . Actually,

Another approach to indexing is based on B-trees (“B-trees”). In a B-tree, data is written to disk in fixed size pages. These blocks of data are often around 4 KB in size and have key-value pairs sorted by key. One B-tree node is like an array with links to a range of pages. Max. the number of links in an array is called the branch factor. Each page range is another B-tree node with links to other page ranges.

Eventually, at the sheet level, you will find individual pages. This idea is similar to pointers in low-level programming languages, except that these page references are stored on disk rather than in memory. When INSERTs and DELETEs occur in the database, then some node can split into two subtrees to match the branching factor. If the database fails for any reason in the middle of the process, the integrity of the data may be compromised. To prevent this from happening, databases using B-trees maintain a "write-ahead log", or WAL, in which every single transaction is recorded. This WAL is used to restore the state of the B-tree if it is corrupted. And it seems that this is what makes databases using B-trees better in terms of durability. But LSM-based databases can also maintain a file that essentially performs the same function as WAL. Therefore, I will repeat what I have already said, and perhaps more than once: understand the mechanisms of operation of the database you have chosen.

What is certain about B-trees, however, is that they are good for transactionality: each key occurs in only one place in the index, while journaled storage subsystems can have multiple copies of the same key in different shards (for example , until the next compaction is performed).

However, the design of the index directly affects the performance of the database. With an LSM tree, writes to disk are sequential, and B-trees cause multiple random disk accesses, so write operations are faster with LSM than with B-trees. The difference is especially significant for magnetic hard disk drives (HDDs), where sequential writes are much faster than random writes. Reading is slower on LSM trees because you have to look through several different data structures and SS tables that are at different stages of compaction. In more detail it looks like this. If we make a simple database query with LSM, we will first look up the key in the MemTable. If it's not there, we look at the most recent SSTable; if not there, then we look at the penultimate SSTable, and so on. If the requested key does not exist, then with LSM we will know this last. LSM trees are used in, for example: LevelDB, RocksDB, Cassandra and HBase.

I describe it all in such detail so that you understand that when choosing a database, you need to consider many different things: for example, do you expect to write or read data more. And I have not yet mentioned the difference in data models (do you need to traverse the data, as the graph model allows? Are there any relationships between different units in your data at all - then relational databases will come to the rescue?), and 2 types data schemas - when writing (as in many NoSQL) and reading (as in relational).

If we return to the aspect of durability, then the conclusion will be as follows: any database that writes to disk, regardless of the indexing mechanisms, can provide good guarantees for the durability of your data, but you need to deal with each specific database, what exactly it offers.

6.5 How in-memory DBs work

By the way, in addition to databases that write to disk, there are also so-called "in-memory" databases that work mainly with RAM. In short, in-memory databases typically offer lower durability for the sake of faster write and read speeds, but this may be appropriate for some applications.

The fact is that RAM memory has long been more expensive than disks, but recently it has begun to rapidly become cheaper, which has given rise to a new type of database - which is logical, given the speed of reading and writing data from RAM. But you will rightly ask: what about the data safety of these databases? Here again, you need to look at the details of the implementation. In general, the developers of such databases offer the following mechanisms:

  • You can use RAM powered by batteries;
  • It is possible to write change logs to disk (something like the WALs mentioned above), but not the data itself;
  • You can periodically write copies of the database state to disk (which, without using other options, does not give a guarantee, but only improves durability);
  • You can replicate the state of RAM to other machines.

For example, the in-memory Redis database, which is mainly used as a message queue or cache, lacks durability from ACID: it does not guarantee that a successfully executed command will be stored on disk, since Redis flushes data to disk (if you have persistence enabled) only asynchronously, at regular intervals.

However, this is not critical for all applications: I found an example of the EtherPad cooperative online editor, which flushed every 1-2 seconds, and potentially the user could lose a couple of letters or a word, which was hardly critical. Otherwise, since in-memory databases are good in that they provide data models that would be difficult to implement with disk indexes, Redis can be used to implement transactions - its priority queue allows you to do this.