Course Module 4. Working with databases - Lecture: Introduction to NoSQL databases

1.1 How a NoSQL database works

NoSQL databases use a variety of data models to access and manipulate data. These types of databases are optimized for data-intensive applications that require low latency and flexible data models. All this is achieved by softening the strict data consistency requirements that are typical for other types of databases.

Consider an example of schema modeling for a simple database of books.

In a relational database, a book entry is often split into multiple parts (or "normalized") and stored in separate tables whose relationships are defined by primary and foreign key constraints. In this example, the Books table has ISBN , Book Title, and Edition Number columns, the Authors table has Author ID and Author Name columns , and the Author-ISBN table has "Author" and "ISBN" columns. The relational model is designed to maintain referential integrity between tables in a database. The data is normalized to reduce redundancy and is generally optimized for storage.
In a NoSQL database, a book record is typically stored as a JSON document. For each book, or element, the ISBN , Book Title , Edition Number , Author Name , and Author ID values are stored as attributes in a single document. In this model, the data is optimized for intuitive development and horizontal scalability.

1.2 What can NoSQL databases be used for?

NoSQL databases are well-suited for many modern applications, such as mobile, gaming, web applications, that require flexible, scalable databases with high performance and rich functionality that can provide maximum usability.

Flexibility . Typically, NoSQL databases offer flexible schemas, allowing for faster development and enabling incremental implementation. Due to the use of flexible data models, NoSQL databases are well suited for semi-structured and unstructured data.
Scalability . NoSQL databases are designed to scale using distributed hardware clusters, not by adding expensive, reliable servers. Some cloud service providers run these operations in the background, providing a fully managed service.
High performance . NoSQL databases are optimized for specific data models and access patterns to achieve higher performance than relational databases.
Wide functionality . NoSQL databases provide rich APIs and data types that are specifically designed for their respective data models.

1.3 Types of NoSQL databases

NoSQL databases are used where it is not very convenient to store data in the form of tables. Therefore, they are stored in very different formats. Usually, there are 6 main data types of NoSQL databases.

DB based on key-value pairs

Databases using key-value pairs support high separability and provide unprecedented horizontal scaling not achievable with other types of databases. Good use cases for key-value databases are gaming, advertising, and IoT applications.

For example, Amazon DynamoDB ensures stable database operation with a delay of no more than a few milliseconds at any scale. This robust performance was the main reason for migrating Snapchat Stories to DynamoDB, as this Snapchat feature is associated with the largest storage write load.

Document

In application code, data is often represented as an object or document in a JSON-like format because it is an efficient and intuitive data model for developers. Document databases allow developers to store and query data in a database using the same document model that they use in their application code. The flexible, semi-structured, hierarchical nature of documents and document databases allows them to evolve with application needs.

The document model works well in catalogs, user profiles, and content management systems where each document is unique and changes over time. Amazon DocumentDB (compatible with MongoDB) and MongoDB are common document databases that provide functional and intuitive APIs for agile development.

Graph databases

Graph databases make it easier to develop and run applications that work with sets of complex data. Typical examples of using graph databases are social networks, recommendation services, fraud detection systems, and knowledge graphs. Amazon Neptune is a fully managed graph database service. Neptune supports the Property Graph and Resource Description Framework (RDF), providing two graph APIs to choose from: TinkerPop and RDF/SPARQL. Common graph databases include Neo4j and Giraph.

DB in memory

Often gaming and advertising applications use leaderboards, session storage, and real-time analytics. Such capabilities require a response within a few microseconds, while a sharp increase in traffic is possible at any time.

Amazon MemoryDB for Redis is a Redis-compatible, reliable in-memory database service that reduces read latency to milliseconds and provides durability across multiple availability zones. MemoryDB is purpose-built for ultra-high performance and reliability, so it can be used as the primary database for modern microservice-based applications.

Amazon ElastiCache is a fully managed Redis and Memcached compatible in-memory caching service to serve low latency, high throughput workloads. Clients like Tinder who need their apps to respond in real time are using in-memory rather than disk storage systems. Another example of a purpose-built data warehouse is the Amazon DynamoDB Accelerator (DAX). DAX allows DynamoDB to read data several times faster.

Search databases

Many applications generate logs to make it easier for developers to troubleshoot and fix problems. Amazon OpenSearch is a purpose-built service for near real-time visualization and analytics of automatically generated data streams by indexing, aggregating and searching semi-structured logs and metrics.

In addition, Amazon OpenSearch is a powerful, high-performance full-text search service. Expedia leverages over 150 Amazon OpenSearch service domains, 30 TB of data, and 30 billion documents for a variety of mission-critical use cases, from operational monitoring and troubleshooting to distributed application stack tracking and cost optimization.

1.4 Comparison of SQL (relational) and NoSQL (non-relational) databases

NoSQL has many advantages, so you should at least in theory know that the tool you need already exists before writing it yourself. Below I will give a comparison of NoSQL and SQL databases:

Suitable workloads

Relational databases are designed for transactional and highly consistent real-time transaction processing (OLTP) applications and are well suited for real-time analytical processing (OLAP).

NoSQL databases are designed to work with a range of data access patterns, including low latency applications. NoSQL search databases are designed for analytics of semi-structured data.

Data model

The relational model normalizes data and transforms it into tables consisting of rows and columns. A schema rigidly defines tables, rows, columns, indexes, relationships between tables, and other database elements. Such a database ensures the integrity of reference data in relationships between tables.

NoSQL databases provide a variety of data models such as key-value pairs, documents, and graphs that are optimized for high performance and scalability.

ACID Properties

Relational databases provide a set of ACID properties: atomicity, consistency, isolation, reliability.

Atomicity requires that a transaction be executed in its entirety or not at all.
Consistency means that as soon as a transaction completes, the data must conform to the database schema.
Isolation requires that parallel transactions run separately from each other.
Reliability refers to the ability to recover to the last saved state after an unexpected system failure or power outage.

NoSQL databases often offer a compromise, relaxing the rigid requirements of ACID properties in favor of a more flexible data model that allows for horizontal scaling. This makes NoSQL a great choice for high-bandwidth, low-latency use cases that require horizontal scaling beyond a single instance.

Performance

Performance mainly depends on the disk subsystem. Optimization of queries, indexes, and table structure is often required for maximum performance.

Performance typically depends on the size of the underlying hardware cluster, network latency, and the calling application.

Scaling

Relational databases typically scale by increasing the computing power of the hardware or by adding separate copies for read workloads.

NoSQL databases typically support high separability through scalable access patterns based on a distributed architecture. This improves throughput and delivers consistent performance at nearly unlimited scale.

API

Requests for writing and retrieving data are written in SQL. These queries are parsed and executed by a relational database.

Object-oriented APIs allow application developers to easily write and retrieve data structures. Using partition keys, applications can search key-value pairs, column sets, or semi-structured documents containing serial objects and application attributes.