CodeGym /Courses /SQL SELF /Balancing Normalization and Performance

Balancing Normalization and Performance

SQL SELF
Level 26 , Lesson 3
Available

When we perfectly normalize data, every table gets as compact as possible, and the info in it sticks to a single principle. But to run real-life queries (like "Which students are enrolled in the SQL course?"), you might need to join a bunch of tables. The more tables, the messier the queries, and the more the system has to "shovel data around".

You probably already know about JOINs from earlier lectures. Here’s an example of a query you might need with a well-designed database:

SELECT students.name, courses.title
FROM students
JOIN enrollments ON students.id = enrollments.student_id
JOIN courses ON enrollments.course_id = courses.id
WHERE courses.title = 'SQL';

Sounds simple, but under the hood the server is doing a ton of work: reading every table, joining data, filtering... And what if the tables are huge? Makes sense that performance will take a hit.

Battle: Normalization vs. Speed

Luckily (or maybe not?), in real life databases are all about compromise. Full normalization keeps your data consistent, but slows down complex queries. If your database is used for analytics and reports, sometimes it’s better to denormalize it. It’s like swapping 10 tiny boxes for one big chest: grabbing data is faster, but sorting it all out again gets trickier.

When should you "chill out" with normalization?

There are scenarios where denormalization is the way to go:

Frequently used aggregates

For example, imagine your system runs queries every day to count how many students are in each course. In a normalized structure, you’d have to run JOIN and COUNT() all the time. Instead, you could add a student_count column to the "Courses" table, updating it automatically when records are added or deleted.

-- Denormalized column
UPDATE courses
SET student_count = (
    SELECT COUNT(*)
    FROM enrollments
    WHERE enrollments.course_id = courses.id
);

Frequently run reports

If your client wants a daily report like "Who bought what, where, and when?", it’s easier to keep a denormalized table with ready-made rows like "Client name, product, date". The main table will be bigger, but you’ll get data way faster.

Lots of reads, few writes

If your database is mostly for reading (like analytics), it’s worth sacrificing normalization for speed.

Minimizing joins in complex relationships

If your tables have deep (nested) relationships and JOINs become a nightmare, drop some normalization layers.

Example: How does denormalization speed things up?

Let’s say we have normalized tables for an online store:

Table products Table orders Table order_items
id id id
name date order_id
price customer_id product_id
quantity

Each order (orders) has order lines (order_items). Let’s count how much money the store made:

SELECT SUM(order_items.quantity * products.price) AS total_revenue
FROM order_items
JOIN products ON order_items.product_id = products.id;

Joining order_items and products will slow down the query with big data volumes.

Denormalized structure

Now imagine the order_items table has an "extra" column total_price (denormalization):

Table order_items
id
order_id
product_id
quantity
total_price

Now the query is a breeze:

SELECT SUM(total_price) AS total_revenue
FROM order_items;

This way we skip the JOIN, so it runs faster.

Practical Task: Optimizing the "Sales" Database

Given: normalized tables

Table products Table sales
id id
name product_id
price date
quantity

Task: speed up frequent queries like "How much did we earn on each product?".

Step 1: add a total_price column to the sales table:

ALTER TABLE sales ADD COLUMN total_price NUMERIC;

Step 2: fill this column for existing data:

UPDATE sales
SET total_price = quantity * (
    SELECT price
    FROM products
    WHERE products.id = sales.product_id
);

Step 3: run faster queries:

SELECT product_id, SUM(total_price) AS total_revenue
FROM sales
GROUP BY product_id;

But! Denormalization has a price

You get that "faster" doesn’t always mean "better", right? With denormalization, you get some issues:

Extra storage

The total_price column is a copy of data, so it needs more space.

Update headaches

If a product’s price changes in the products table, you have to update the total_price column by hand. This can cause mismatches.

Insert, update, and delete anomalies

Info can easily get "out of sync" if you forget to update denormalized data. For example, if the product price changes, it doesn’t update automatically.

Balance: How to find the golden mean?

Pick what matters more: performance or structure? If your DB is read-heavy, tune it for queries.

Add denormalization only where it counts. Like, just for key numbers and reports.

Automate updates of denormalized data. Use triggers or jobs to avoid mismatches.

2
Task
SQL SELF, level 26, lesson 3
Locked
Denormalization for Faster Counting
Denormalization for Faster Counting
Comments
TO VIEW ALL COMMENTS OR TO MAKE A COMMENT,
GO TO FULL VERSION