Never write your caching solution

Another way to speed up the work with the database is to cache objects that we have already requested earlier.

Important! Never write your own caching solution. This task has so many pitfalls that you never dreamed of.

Issue 1 - cache flush . Sometimes events occur when an object needs to be removed from the cache or updated in the cache. The only way to do this competently is to pass all requests to the database through the cache engine. Otherwise, each time you will have to explicitly tell the cache which objects in it should be deleted or updated.

Problem 2 - lack of memory . Caching seems like a great idea until you find that objects in memory take up a lot of space. You need additional tens of gigabytes of memory for the server application cache to work effectively.

And since there is always a shortage of memory, an effective strategy for deleting objects from the cache is needed. This is somewhat similar to the garbage collector in Java. And as you remember, for decades the best minds have been inventing various ways of marking objects by generations, etc.

Problem 3 - different strategies . As practice shows, different strategies for storing and updating in the cache are effective for different objects. An efficient caching system cannot do just one strategy for all objects.

Problem 4 - Efficient storage of . You can't just store objects in the cache. Objects too often contain references to other objects, and so on. At this rate, you won't need a garbage collector: it just won't have anything to remove.

Therefore, instead of storing the objects themselves, it is sometimes much more efficient to store the values ​​of their primitive fields. And systems for quickly constructing objects based on them.

As a result, you will get a whole virtual DBMS in memory, which should work quickly and consume little memory.

Database caching

In addition to caching directly in a Java program, caching is often organized directly in the database.

There are four big approaches:

The first approach is to denormalize the database . The SQL server stores data in memory differently from how it is stored in tables.

When data is stored on disk in tables, very often developers try to avoid data duplication as much as possible - this process is called database normalization. So, to speed up work with data in memory, the reverse process is performed - database denormalization. A bunch of related tables can already be stored in a combined form - in the form of huge tables, etc.

The second approach is query caching . And query results.

The DBMS sees that very often the same or similar requests come to it. Then it simply starts caching these requests and their responses. But at the same time, you need to make sure that rows that have changed in the database are removed from the cache in a timely manner.

This approach can be very effective with a human being who can analyze queries and help the DBMS figure out how best to cache them.

The third approach is an in-memory database .

Another commonly used approach. Another database is placed between the server and the DBMS, which stores all its data only in memory. It is also called In-Memory-DB. If you have many different servers accessing the same database, then using In-Memory-DB you can organize caching based on the type of a particular server.

Example:

Approach 4 - database cluster . Several read-only bases.

Another solution is to use a cluster: several DBMSs of the same type contain identical data. At the same time, you can read data from all databases, and write to only one. Which is then synchronized with the rest of the databases.

This is a very good solution because it is easy to configure and works in practice. Usually, for one request to the database to change data, 10-100 requests for reading data come to it.

Types of caching in Hibernate

Hibernate supports three levels of caching:

  • Caching at the session level (Session)
  • Caching at the SessionFactory level
  • Caching requests (and their results)

You can try to represent this system in the form of such a figure:

The simplest type of caching (also called the first level cache ) is implemented at the Hibernate session level. Hibernate always uses this cache by default and cannot be disabled .

Let's immediately consider the following example:

Employee director1 = session.get(Employee.class, 4);
Employee director2 = session.get(Employee.class, 4);

assertTrue(director1 == director2);

It may seem that two queries to the database will be executed here, but this is not so. After the first request to the database, the Employee object will be cached. And if you query the object again in the same session, Hibernate will return the same Java object.

The same object means that even object references will be identical. It's really the same object.

The save() , update() , saveOrUpdate() , load() , get() , list() , iterate() , and scroll() methods will always use the first level cache. Actually, there is nothing more to add.

Second level caching

If the first level cache is bound to the session object, then the second level cache is bound to the session object.SessionFactory. Which means that the visibility of objects in this cache is much wider than in the first level cache.

Example:

Session session = factory.openSession();
Employee director1 = session.get(Employee.class, 4);
session.close();

Session session = factory.openSession();
Employee director2 = session.get(Employee.class, 4);
session.close();

assertTrue(director1 != director2);
assertTrue(director1.equals(director2));

In this example, two queries will be made to the database. Hibernate will return identical objects, but it won't be the same object - they will have different references.

Second level caching is disabled by default . Therefore, we have two queries to the database instead of one.

To enable it, you need to write the following lines in the hibernate.cfg.xml file:

<property name="hibernate.cache.provider_class" value="net.sf.ehcache.hibernate.SingletEhCacheProvider"/>
<property name="hibernate.cache.use_second_level_cache" value="true"/>

After enabling second-level caching, Hibernate behavior will change a bit:

Session session = factory.openSession();
Employee director1 = session.get(Employee.class, 4);
session.close();

Session session = factory.openSession();
Employee director2 = session.get(Employee.class, 4);
session.close();

assertTrue(director1 == director2);

Only after all these manipulations will the second-level cache be enabled, and in the example above, only one query to the database will be executed.