6.1 Who invented HBase and why

In this lecture, we will talk about such a wonderful tool as Hbase, which has recently gained great popularity: for example, Facebook uses it as the basis of its messaging system, and this already says a lot.

The lecture will talk about the concept of Big Table and its free implementation, features of work and difference from both classical relational databases (such as MySQL and Oracle) and key-value storages such as Redis, Aerospike and memcached. As usual, let's start with the history of the issue. Like many other BigData projects, Hbase was born from a concept that was developed by Google. The principles behind Hbase were described in the Bigtable: A Distributed Storage System for Structured Data article .

As we discussed in previous lectures, ordinary files are quite well suited for batch data processing using the MapReduce paradigm. On the other hand, information stored in files is rather inconvenient to update; Files are also deprived of the possibility of random access. For quick and convenient work with random access, there is a class of nosql systems such as key-value storage, such as Aerospike, Redis, Couchbase, Memcached. However, batch processing is usually very inconvenient in these systems. Hbase is an attempt to combine the convenience of batch processing with the convenience of updating and random access.

6.2 Data model

HBase is a distributed, column-oriented, multiversion key-value database.

  • The data is organized into tables indexed by a primary key called RowKey in Hbase.
  • For each RowKey key, an unlimited set of attributes (or columns) can be stored.
  • Columns are organized into groups of columns called Column Family. As a rule, columns that have the same usage and storage pattern are combined into one Column Family.
  • For each attribute, several different versions can be stored. Different versions have different timestamp.

Records are physically stored in RowKey sorted order. In this case, the data corresponding to different Column Family is stored separately, which allows, if necessary, to read data only from the desired column family.

When a certain attribute is deleted, it is not physically deleted immediately, but is only marked with a special tombstone flag. The physical deletion of the data will occur later, when the Major Compaction operation is performed.

Attributes belonging to the same column group and corresponding to the same key are physically stored as a sorted list. Any attribute can be absent or present for each key, and if the attribute is absent, this does not cause the overhead of storing empty values.

The list and column group names are fixed and have a clear layout. At the column group level, parameters such as time to live (TTL) and the maximum number of stored versions are set. If the difference between the timestamp for a particular version and the current time is greater than TTL, the entry is marked for deletion. If the number of versions for a certain attribute exceeds the maximum number of versions, the record is also marked for deletion.

The Hbase data model can be remembered as a key-value match:

<table, RowKey, Column Family, Column, timestamp> -> Value

6.3 Supported operations

The list of supported operations in hbase is quite simple. 4 main operations are supported:

  • Put : add a new entry to hbase. The timestamp of this entry can be set by hand, otherwise it will be set automatically to the current time.
  • Get : Get data for a specific RowKey. You can specify the Column Family from which we will take the data and the number of versions we want to read.
  • Scan : read records one by one. You can specify the record from which we start reading, the record to which to read, the number of records to be read, the Column Family from which the reading will be performed and the maximum number of versions for each record.
  • Delete : Mark a specific version for deletion. There will be no physical deletion, it will be postponed until the next Major Compaction (see below).

6.4 Architecture

HBase is a distributed database that can run on dozens or hundreds of physical servers, ensuring uninterrupted operation even if some of them fail. Therefore, the architecture of HBase is quite complex compared to classic relational databases.

HBase uses two main processes for its work:

1. Region Server - Serves one or more regions. A region is a range of records corresponding to a specific range of consecutive RowKeys. Each region contains:

  • Persistent Storage is the main data storage in HBase. The data is physically stored on HDFS, in a special HFile format. Data in HFile is stored in RowKey sorted order. One pair (region, column family) corresponds to at least one HFIle.
  • MemStore - write buffer. Since the data is stored in HFile d in sorted order, it is quite expensive to update the HFile per record. Instead, when writing, data enters a special MemStore memory area, where it accumulates for some time. When the MemStore is filled to some critical value, the data is written to a new HFile.
  • BlockCache - cache for reading. Allows you to significantly save time on data that is read frequently.
  • Write Ahead Log (WAL) . Since the data is written to the memstore, there is some risk of data loss due to a crash. In order to prevent this from happening, all operations before the actual implementation of the manipulations fall into a special log file. This allows you to recover data after any failure.

2. Master Server - the main server in the HBase cluster. The Master manages the distribution of regions among Region Servers, maintains a register of regions, manages the launch of regular tasks, and does other useful work.

To coordinate actions between services, HBase uses Apache ZooKeeper, a special service designed to manage configurations and synchronize services.

When the amount of data in a region increases and reaches a certain size, Hbase starts split, an operation that splits the region by 2. To avoid constant divisions of regions, you can pre-set the boundaries of the regions and increase their maximum size.

Since data for one region can be stored in several HFiles, Hbase periodically merges them together to speed up the work. This operation is called compaction in Hbase. Compactions are of two types:

  • Minor compaction . Starts automatically, runs in the background. Has a low priority compared to other Hbase operations.
  • Major compaction . It is launched by hand or upon the occurrence of certain triggers (for example, by a timer). It has a high priority and can significantly slow down the cluster. Major Compactions are best done at a time when the load on the cluster is small. Major Compaction also physically deletes data previously marked with tombstone.

6.5 Ways to work with HBase

HBase Shell

The easiest way to get started with Hbase is to use the hbase shell utility. It is available immediately after installing hbase on any hbase cluster node.

Hbase shell is a jruby console with built-in support for all basic Hbase operations. The following is an example of creating a users table with two column families, doing some manipulations on it, and dropping the table at the end in hbase shell:

create 'users', {NAME => 'user_profile', VERSIONS => 5}, {NAME => 'user_posts', VERSIONS => 1231231231} 
put 'users', 'id1', 'user_profile:name', 'alexander' 
put 'users', 'id1', 'user_profile:second_name', 'alexander' 
get 'users', 'id1' 
put 'users', 'id1', 'user_profile:second_name', 'kuznetsov' 
get 'users', 'id1' 
get 'users', 'id1', {COLUMN => 'user_profile:second_name', VERSIONS => 5} 
put 'users', 'id2', 'user_profile:name', 'vasiliy' 
put 'users', 'id2', 'user_profile:second_name', 'ivanov' 
scan 'users', {COLUMN => 'user_profile:second_name', VERSIONS => 5} 
delete 'users', 'id1', 'user_profile:second_name' 
get 'users', 'id1' 
disable 'users' 
drop 'users'

Native API

Like most other hadoop-related projects, hbase is implemented in java, so the native api is available in Java. The Native API is pretty well documented on the official website. Here is an example of using the Hbase API taken from there:

import java.io.IOException;

import org.apache.hadoop.hbase.*;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;

public class MyLittleHBaseClient {
  public static void main(String[] args) throws IOException {
	Configuration config = HBaseConfiguration.create();
	Connection connection = ConnectionFactory.createConnection(config);
	try {
  	Table table = connection.getTable(TableName.valueOf("myLittleHBaseTable"));
  	try {
    	Put p = new Put(Bytes.toBytes("myLittleRow"));
    	p.add(Bytes.toBytes("myLittleFamily"), Bytes.toBytes("someQualifier"),
    	Bytes.toBytes("Some Value"));
    	table.put(p);

    	Get g = new Get(Bytes.toBytes("myLittleRow"));
    	Result r = table.get(g);
    	byte [] value = r.getValue(Bytes.toBytes("myLittleFamily"),
      	Bytes.toBytes("someQualifier"));

    	String valueStr = Bytes.toString(value);
    	System.out.println("GET: " + valueStr);

    	Scan s = new Scan();
    	s.addColumn(Bytes.toBytes("myLittleFamily"), Bytes.toBytes("someQualifier"));
    	ResultScanner scanner = table.getScanner(s);
    	try {
       	for (Result rr = scanner.next(); rr != null; rr = scanner.next()) {
         	System.out.println("Found row: " + rr);
       	}
     	} finally {
       	scanner.close();
     	}
   	} finally {
     	if (table != null) table.close();
   	}
 	} finally {
   	connection.close();
 	}
  }
}

Thrift, REST and support for other programming languages

To work from other programming languages, Hbase provides Thrift API and Rest API. Based on them, clients are built for all major programming languages: python, PHP, Java Script, etc.

6.6 Some features of working with HBase

  1. Hbase integrates out of the box with MapReduce, and can be used as input and output using the special TableInputFormat and TableOutputFormat.

  2. It is very important to choose the right RowKey. RowKey must provide a good even distribution across regions, otherwise there is a risk of so-called "hot regions" - regions that are used much more often than others, which leads to inefficient use of system resources.

  3. If the data is not uploaded singly, but immediately in large batches, Hbase supports a special BulkLoad mechanism that allows you to upload data much faster than using single Puts. BulkLoad is essentially a two-step operation:

    • Formation of HFile without the participation of puts using a special MapReduce job
    • Inserting these files directly into Hbase
  4. Hbase supports outputting its metrics to the Ganglia monitoring server. This can be very helpful when administering Hbase to get to the bottom of hbase issues.

row key

The RowKey is the user ID, which is a GUUID, a string specially generated to be unique worldwide. GUUIDs are distributed evenly, which gives a good distribution of data across servers.

Column Family

Our storage uses two column families:

  • data. This group of columns stores data that is no longer relevant for advertising purposes, such as the fact that a user has visited certain URLs. The TTL for this Column Family is set to 2 months, the limit on the number of versions is 2000.
  • longdata. This group of columns stores data that does not lose its relevance over time, such as gender, date of birth, and other “eternal” user characteristics.

speakers

Each type of user facts is stored in a separate column. For example, the Data:_v column stores the URLs visited by the user, and the LongData:gender column stores the user's gender.

The time stamp of this fact is stored as a timestamp. For example, in the Data:_v column, the timestamp is the time the user visited a specific URL.

This user data storage structure fits very well with our usage pattern and allows you to quickly update user data, quickly get all the necessary information about users, and, using MapReduce, quickly process data about all users at once.

6.7 Alternatives

HBase is quite complex to administer and use, so before using HBase it makes sense to look at the alternatives:

  • Relational Databases . A very good alternative, especially in the case when the data fit on one machine. Also, first of all, you should think about relational databases in the case when transactions of indexes other than the primary are important.

  • Key-Value storage . Storages like Redis and Aerospike are better suited when latency is needed and batch processing is less important.

  • Files and their processing with MapReduce . If the data is only added and rarely updated/changed, then it is better not to use HBase, but simply store the data in files. To simplify the work with files, you can use tools such as Hive, Pig and Impala.

The use of HBase is justified when:

  • There is a lot of data, and they do not fit on one computer / server
  • Data is frequently updated and deleted
  • There is an explicit “key” in the data, to which it is convenient to bind everything else
  • Need batch processing
  • Need random access to data by specific keys