Today we're taking another step forward and diving into the magic of recursion. If you've ever coded in a language that supports recursion (like Python), you kinda get the idea. But don't worry if this sounds mysterious — we'll break it all down in detail.
Recursive CTEs are a powerful tool for working with hierarchical, tree-like data structures, like company org charts, family trees, or file directories.
Simply put, these are expressions that can "call themselves" to gradually walk through and process all levels of your data.
Key features of recursive CTEs:
- They use the
WITH RECURSIVEkeyword. - Recursive CTEs have two parts:
- Base query: sets the starting point (or "root") of the recursion.
- Recursive query: processes the remaining data, using the result from the previous step.
The way a recursive CTE works is kinda like climbing a staircase:
- First, you step onto the first stair (that's the base query).
- Then you move up to the next stair, using the result from the previous step (the recursive query).
- This process repeats until there are no more stairs left (you hit the stop condition).
Recursive CTE Syntax
Let's check out a template example right away:
WITH RECURSIVE cte_name AS (
-- Base query
SELECT column1, column2
FROM table_name
WHERE condition_for_base_case
UNION ALL
-- Recursive query
SELECT column1, column2
FROM table_name
JOIN cte_name ON some_condition
WHERE stop_condition
)
SELECT * FROM cte_name;
The Role of UNION and UNION ALL in Recursive CTEs
Every recursive CTE must use either UNION or UNION ALL between the base and recursive parts.
| Operator | What it does |
|---|---|
UNION |
Combines the results of two queries and removes duplicate rows |
UNION ALL |
Combines and keeps all rows, including duplicates |
Which Operator to Choose: UNION or UNION ALL?
If you're not sure which to use — almost always go with UNION ALL. Why? Because it's faster: it just merges results without checking for duplicates. That means less computation, less resources, and faster results.
This is especially important in recursive CTEs. When you're building hierarchies — like a comment tree or a company org chart — you almost always want UNION ALL. If you use just UNION, the database might think some steps already happened and "cut off" part of your result. That'll break your whole traversal logic.
Use UNION only if you know for sure that duplicates are bad and need to be removed. But remember: it's always a trade-off between clean results and speed.
Example of Different Approaches
-- UNION: duplicates are excluded
SELECT 'A'
UNION
SELECT 'A'; -- Result: one row 'A'
-- UNION ALL: duplicates are kept
SELECT 'A'
UNION ALL
SELECT 'A'; -- Result: two rows 'A'
In recursive queries, it's safer to always use UNION ALL so you don't lose important steps when traversing a structure.
Let's look at a typical task: we have an employees table with columns employee_id, manager_id, and name. We need to build a hierarchy starting from the director — the person with no boss (manager_id = NULL).
Let's say we have an employees table: employees
| employee_id | name | manager_id |
|---|---|---|
| 1 | Eva Lang | NULL |
| 2 | Alex Lin | 1 |
| 3 | Maria Chi | 1 |
| 4 | Otto Mart | 2 |
| 5 | Anna Song | 2 |
| 6 | Eva Lang | 3 |
We need to figure out who reports to whom, and find out each employee's level in the structure. This is handy if you want to, say, display an employee tree in a UI or prep a report on team structure.
WITH RECURSIVE employee_hierarchy AS (
-- Start with those who have no manager
SELECT
employee_id,
name,
manager_id,
1 AS level
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Add subordinates and increase the level
SELECT
e.employee_id,
e.name,
e.manager_id,
eh.level + 1
FROM employees e
INNER JOIN employee_hierarchy eh
ON e.manager_id = eh.employee_id
)
SELECT * FROM employee_hierarchy;
The result will look like this:
| employee_id | name | manager_id | level |
|---|---|---|---|
| 1 | Eva Lang | NULL | 1 |
| 2 | Alex Lin | 1 | 2 |
| 3 | Maria Chi | 1 | 2 |
| 4 | Otto Mart | 2 | 3 |
| 5 | Anna Song | 2 | 3 |
| 6 | Eva Lang | 3 | 3 |
This query clearly shows how you can "walk" through the employee hierarchy — from the director down to the most junior folks. The level column is super handy for formatting or visualizing the tree.
Example: Product Categories
Now imagine we're working with a product categories table, where each category can have subcategories, and those can have their own subcategories. How do we build a category tree?
Table categories
| category_id | name | parent_id |
|---|---|---|
| 1 | Electronics | NULL |
| 2 | Computers | 1 |
| 3 | Smartphones | 1 |
| 4 | Laptops | 2 |
| 5 | Peripherals | 2 |
Recursive query:
WITH RECURSIVE category_tree AS (
-- Base case: find root categories
SELECT
category_id,
name,
parent_id,
1 AS depth
FROM categories
WHERE parent_id IS NULL
UNION ALL
-- Recursive part: find subcategories of current categories
SELECT
c.category_id,
c.name,
c.parent_id,
ct.depth + 1
FROM categories c
INNER JOIN category_tree ct
ON c.parent_id = ct.category_id
)
SELECT * FROM category_tree;
Result:
| category_id | name | parent_id | depth |
|---|---|---|---|
| 1 | Electronics | NULL | 1 |
| 2 | Computers | 1 | 2 |
| 3 | Smartphones | 1 | 2 |
| 4 | Laptops | 2 | 3 |
| 5 | Peripherals | 2 | 3 |
Now we see the category tree with nesting levels.
Why Are Recursive CTEs Awesome?
Recursive CTEs are one of the most expressive and powerful tools in SQL. Instead of gnarly nested logic, you just describe where to start (the base case) and how to move forward (the recursive part) — PostgreSQL does the rest.
Most often, these queries are used to traverse hierarchies: employees, product categories, disk directories, social network graphs. They're easy to extend: if new data gets added to the table, the query picks it up automatically. It's convenient and scalable.
But there are some gotchas. Always watch your stop conditions — without them, your query can go into an infinite loop. Don't forget about indexes: in big tables, recursive queries without them can be slow. And UNION ALL is almost always the best choice, especially for hierarchical tasks, otherwise you risk losing recursion steps because of duplicate removal.
A well-tuned recursive CTE lets you express complex business logic in just a few lines — no procedures, loops, or extra code. This is one of those times when SQL is not just correct, but beautiful.
Common Mistakes When Working with Recursive CTEs
- Infinite recursion: if you don't set a proper stop condition (
WHERE), your query can loop forever. - Too much data: using
UNION ALLincorrectly adds duplicates. - Performance: recursive queries can be heavy on large datasets. Indexes on key columns (like
manager_id) help speed things up.
When You Can't Do Without Recursive Queries
Sometimes it seems like recursive queries are just theory, but in reality, they pop up all the time in day-to-day development. For example:
- to build reports on company structure or product classification;
- to walk through a folder tree and get a list of all nested directories;
- to analyze graphs — social connections, routes, dependencies between tasks;
- to just show complex relationships between objects in a readable way.
If you need to walk through a structure where one thing depends on another — chances are you'll need WITH RECURSIVE.
GO TO FULL VERSION