CodeGym /Courses /SQL SELF /Typical Mistakes When Working with CTEs and How to Avoid ...

Typical Mistakes When Working with CTEs and How to Avoid Them

SQL SELF
Level 28 , Lesson 4
Available

Alright, now it’s time to talk about the dark side of working with CTEs — the typical mistakes. Even the coolest query can break if you use these powerful tools the wrong way. But don’t worry, we’ve got a whole guide for you on how to spot and prevent them!

1. Mistake: CTE Materialization and Its Consequences

One of the key things about PostgreSQL when working with CTEs is their default materialization. This means the result of a CTE is processed and temporarily stored in memory (or on disk if there’s too much data). If you have a lot of queries or a huge amount of data, this can seriously slow things down.

Example:

WITH heavy_data AS (
    SELECT * FROM large_table
)
SELECT * FROM heavy_data WHERE column_a > 100;

At first glance, it looks like the CTE is just filtering data. But actually, heavy_data is fully loaded and materialized first, and only then the filtering is applied. This can take forever.

How to avoid it?

Since PostgreSQL version 12, you can use CTEs as inline expressions (just like a subquery), which solves the materialization problem. Just make sure you use CTEs that are only referenced once and don’t need to save intermediate results.

Example of an optimized approach:

WITH inline_data AS MATERIALIZED (
    SELECT * FROM large_table
)
SELECT * FROM inline_data WHERE column_a > 100;

Tip: If you want materialization to happen, specify MATERIALIZED. If not — use NOT MATERIALIZED.

2. Mistake: Recursive CTEs Get Stuck in a Loop

Recursive CTEs are super powerful, but using them without a limit on iteration depth can lead to infinite loops. This not only slows things down, but also eats up all available resources.

Example:

WITH RECURSIVE endless_loop AS (
    SELECT 1 AS value

    UNION ALL

    SELECT value + 1
    FROM endless_loop
)
SELECT * FROM endless_loop;

This thing will generate an endless number of rows, since there’s no condition to stop the recursion.

How to avoid it?

Add a clear stop condition using WHERE. For example:

WITH RECURSIVE limited_loop AS (
    SELECT 1 AS value

    UNION ALL

    SELECT value + 1
    FROM limited_loop
    WHERE value < 10
)
SELECT * FROM limited_loop;

Tip: If you’re using recursive CTEs for big hierarchies, limit recursion depth with the max_recursion_depth option in PostgreSQL.

3. Mistake: Wrong Use of UNION and UNION ALL

When you combine base and recursive queries in a CTE, picking the wrong one between UNION and UNION ALL can give you unexpected results. For example, UNION removes duplicate rows, which adds extra computation.

Example:

WITH RECURSIVE employee_hierarchy AS (
    SELECT employee_id, manager_id
    FROM employees
    WHERE manager_id IS NULL

    UNION  -- Here it’s better to use UNION ALL

    SELECT e.employee_id, e.manager_id
    FROM employees e
    JOIN employee_hierarchy eh ON e.manager_id = eh.employee_id
)
SELECT * FROM employee_hierarchy;

In this example, UNION might remove important rows from the hierarchy if they accidentally repeat. Plus, it’ll slow down your query!

How to fix it?

Use UNION ALL unless you really need to remove duplicates:

UNION ALL

4. Mistake: Too Many CTEs in One Query

Trying to make a query as structured as possible, some folks start adding dozens of CTEs. This not only makes the code confusing, but also overloads the PostgreSQL query planner.

Example:

WITH cte1 AS (...),
     cte2 AS (...),
     cte3 AS (...),
     ...
     cte20 AS (...)
SELECT ...
FROM cte20;

Looks like a nightmare for any developer.

How to fix it?

— Split the query into a few simpler ones. Instead of one mega-query with dozens of CTEs — make a few independent queries.

— Another option: for intermediate results you need to use several times, save them in temp tables.

5. Mistake: Complex CTEs Without Indexes

If your CTE works with a ton of data but you forgot to add indexes to your tables, queries will run super slow. Indexes are like steroids for your database.

Example:

WITH filtered_data AS (
    SELECT * FROM large_table WHERE unindexed_column = 'value'
)
SELECT * FROM filtered_data;

How to fix it?

Before using CTEs, make sure your tables are optimized:

CREATE INDEX idx_large_table ON large_table(unindexed_column);

6. Mistake: Trying to Use a CTE for Multiple Data Calls

A CTE is created, executed, and then “frozen”. If you need to use its result in several places, the data won’t be recalculated — and sometimes that leads to bugs.

Example:

WITH data AS (
    SELECT x, y FROM some_table
)
SELECT x FROM data
WHERE y > 10;
-- If you need to recalculate data again, it won’t happen.

How to fix it?

If you need dynamic or recalculated data, maybe a CTE isn’t the best choice. Use subqueries instead.

7. Mistake: No Comments

CTEs can be an awesome tool, but who needs a gnarly SQL query that nobody — including you — can read after two weeks?

Example:

WITH data_filtered AS (
    SELECT *
    FROM large_table
    WHERE some_column > 100
)
SELECT * FROM data_filtered;

A month later, nobody will remember why you filtered this data!

So comment your queries, especially if you’re using complex or recursive CTEs:

WITH data_filtered AS (
    -- Filter data by condition: column some_column > 100
    SELECT *
    FROM large_table
    WHERE some_column > 100
)
SELECT * FROM data_filtered;

8. Mistake: Overusing CTEs Instead of Temp Tables

Sometimes temp tables are just a better fit. For example, if you need to use the result multiple times in different queries or you’re working with a huge dataset.

Example:

WITH temp_data AS (
    SELECT * FROM large_table
)
SELECT * FROM temp_data WHERE column_a > 100;
SELECT * FROM temp_data WHERE column_b < 50;

This kind of query with a CTE will run twice, even though the data doesn’t change!

How to fix it?

Create a temp table if you need to use the data multiple times:

CREATE TEMP TABLE temp_table AS
SELECT * FROM large_table;

SELECT * FROM temp_table WHERE column_a > 100;
SELECT * FROM temp_table WHERE column_b < 50;

Final Tip

Like with any powerful feature, when working with CTEs it’s important to realize they’re not always the best tool for the job. Think about why and how you’re using them. The “more CTEs is better” approach can seriously hurt performance and code readability. And of course, don’t forget to run performance tests and optimize your queries.

2
Task
SQL SELF, level 28, lesson 4
Locked
Using MATERIALIZED to Optimize a CTE
Using MATERIALIZED to Optimize a CTE
1
Survey/quiz
Query Optimization, level 28, lesson 4
Unavailable
Query Optimization
Query Optimization
Comments
TO VIEW ALL COMMENTS OR TO MAKE A COMMENT,
GO TO FULL VERSION