Course All lectures for EN purposes - Lecture: How to Implement ACID in Your Application: Theory

7.1 Why is it necessary

We have discussed in some detail all the properties of ACID, their purpose and use cases. As you can see, not all databases offer ACID guarantees, sacrificing them for better performance. Therefore, it may well happen that a database that does not offer ACID is selected in your project, and you may need to implement some of the necessary ACID functionality on the application side. And if your system is designed as microservices, or some other kind of distributed application, what would be a normal local transaction in one service will now become a distributed transaction - and, of course, will lose its ACID nature, even if the database of each individual microservice will be ACID.

I don't want to give you an exhaustive guide on how to create a transaction manager, simply because it's too big and complicated, and I only want to cover a few basic techniques. If we are not talking about distributed applications, then I see no reason to try to fully implement ACID on the application side if you need ACID guarantees - after all, it will be easier and cheaper in every sense to take a ready-made solution (that is, a database with ACID).

But I would like to show you some techniques that will help you in making transactions on the application side. After all, knowing these techniques can help you in a variety of scenarios, even those that don't necessarily involve transactions, and make you a better developer (I hope so).

7.2 Basic tools for transaction lovers

Optimistic and pessimistic blocking. These are two types of locks on some data that can be accessed at the same time.

Optimistassumes that the probability of concurrent access is not so great, and therefore it does the following: reads the desired line, remembers its version number (or timestamp, or checksum / hash - if you cannot change the data schema and add a column for version or timestamp), and before writing changes to the database for this data, it checks if the version of this data has changed. If the version has changed, then you need to somehow resolve the created conflict and update the data (“commit”), or roll back the transaction (“rollback”). The disadvantage of this method is that it creates favorable conditions for a bug with the long name “time-of-check to time-of-use”, abbreviated as TOCTOU: the state may change in the period of time between check and write. I have no experience with optimistic locking,

As an example, I found one technology from a developer's daily life that uses something like optimistic locking - this is the HTTP protocol. The response to the initial HTTP GET request MAY include an ETag header for subsequent PUT requests from the client, which the client MAY use in the If-Match header. For the GET and HEAD methods, the server will send back the requested resource only if it matches one of the ETags it knows. For PUT and other unsafe methods, it will only load the resource in this case as well. If you don't know how ETag works, here's a good example using the "feedparser" library (which helps parse RSS and other feeds).


>>> import feedparser 
>>> d = feedparser.parse('http://feedparser.org/docs/examples/atom10.xml') 
>>> d.etag 
'"6c132-941-ad7e3080"' 
>>> d2 = feedparser.parse('http://feedparser.org/docs/examples/atom10.xml', etag=d.etag) 
>>> d2.feed 
{} 
>>> d2.debug_message 
'The feed has not changed since you last checked, so the server sent no data.  This is a feature, not a bug!'

The pessimist, on the other hand, proceeds from the fact that transactions will often “meet” on the same data, and in order to simplify his life and avoid unnecessary race conditions, he simply blocks the data he needs. In order to implement the locking mechanism, you either need to maintain a database connection for your session (rather than pulling connections from a pool - in which case you will most likely have to work with optimistic locking), or use an ID for the transaction, which can be used regardless of the connection. The disadvantage of pessimistic locking is that its use slows down the processing of transactions in general, but you can be calm about the data and get real isolation.

An additional danger, however, lurks in the possible deadlock, in which several processes wait for resources locked by each other. For example, a transaction requires resources A and B. Process 1 has occupied resource A, and process 2 has occupied resource B. Neither of the two processes can continue execution. There are various ways to solve this issue - I do not want to go into details now, so read Wikipedia first, but in short, there is the possibility of creating a lock hierarchy. If you want to get to know this concept in more detail, then you are invited to rack your brains over the “Dinning Philosophers Problem” (“dining philosophers problem”).

Here is a good example of how both locks will behave in the same scenario.

Regarding implementations of locks. I do not want to go into details, but there are lock managers for distributed systems, for example: ZooKeeper, Redis, etcd, Consul.

7.3 Idempotency of operations

Idempotent code is generally a good practice, and this is exactly the case when it would be good for a developer to be able to do this, regardless of whether he uses transactions or not. Idempotency is the property of an operation to produce the same result when that operation is applied to an object again. The function was called - gave the result. Called again after a second or five - gave the same result. Of course, if the data in the database has changed, the result will be different. Data in third systems may not depend on a function, but anything that does must be predictable.

There can be several manifestations of idempotence. One of them is just a recommendation on how to write your code. Do you remember that the best function is the one that does one thing? And what would be a good thing to write unit tests for this function? If you adhere to these two rules, then you already increase the chance that your functions will be idempotent. To avoid confusion, I will clarify that idempotent functions are not necessarily “pure” (in the sense of “function purity”). Pure functions are those functions that operate only on the data that they received at the input, without changing them in any way and returning the processed result. These are the functions that allow you to scale your application using functional programming techniques. Since we are talking about some general data and a database, our functions are unlikely to be pure,

This is a pure function:


def square(num: int) -> int: 
	return num * num

But this function is not pure, but idempotent (please do not draw conclusions about how I write code from these pieces):


def insert_data(insert_query: str, db_connection: DbConnectionType) -> int: 
  db_connection.execute(insert_query) 
  return True

Instead of a lot of words, I can just talk about how I was forced to learn how to write idempotent programs. I do a lot of work with AWS, as you can see by now, and there is a service called AWS Lambda. Lambda allows you not to take care of servers, but simply load code that will run in response to some events or according to a schedule. An event can be messages that are delivered by a message broker. In AWS, this broker is AWS SNS. I think that this should be clear even for those who do not work with AWS: we have a broker that sends messages through channels (“topics”), and microservices that are subscribed to these channels receive messages and somehow on them react.

The problem is that SNS delivers messages "at least once" ("at-least-once delivery"). What does it mean? That sooner or later your Lambda code will be called twice. And it really does happen. There are a number of scenarios where your function needs to be idempotent: for example, when money is withdrawn from an account, we can expect someone to withdraw the same amount twice, but we need to make sure that these are really 2 independent times - in other words, these are 2 different transactions, and not a repetition of one.

For a change, I will give another example - limiting the frequency of requests to the API (“rate limiting”). Our Lambda receives an event with a certain user_id for which a check should be made to see if the user with that ID has exhausted his number of possible requests to some of our APIs. We could store in DynamoDB from AWS the value of the calls made, and increase it with each call to our function by 1.

But what if this Lambda function is called by the same event twice? By the way, did you pay attention to the arguments of the lambda_handler() function. The second argument, context in AWS Lambda is given by default and contains various metadata, including the request_id that is generated for each unique call. This means that now, instead of storing the number of calls made in the table, we can store a list of request_id and on each call our Lambda will check if the given request has already been processed:

import json
import os
from typing import Any, Dict

from aws_lambda_powertools.utilities.typing import LambdaContext  # needed only for argument type annotation
import boto3

limit = os.getenv('LIMIT')

def handler_name(event: Dict[str: Any], context: LambdaContext):

	request_id = context.aws_request_id

	# We find user_id in incoming event
	user_id = event["user_id"]

	# Our table for DynamoDB
	table = boto3.resource('dynamodb').Table('my_table')

	# Doing update
	table.update_item(
    	Key={'pkey': user_id},
    	UpdateExpression='ADD requests :request_id',
    	ConditionExpression='attribute_not_exists (requests) OR (size(requests) < :limit AND NOT contains(requests, :request_id))',
    	ExpressionAttributeValues={
        	':request_id': {'S': request_id},
        	':requests': {'SS': [request_id]},
        	':limit': {'N': limit}
    	}
	)

	# TODO: write further logic

	return {
    	"statusCode": 200,
    	"headers": {
        	"Content-Type": "application/json"
    	},
    	"body": json.dumps({
        	"status ": "success"
    	})
	}

Since my example is actually taken from the Internet, I will leave a link to the original source, especially since it gives a little more information.

Remember how I mentioned earlier that something like a unique transaction ID can be used to lock shared data? We have now learned that it can also be used to make operations idempotent. Let's find out in what ways you can generate such IDs yourself.