Imagine you need to write a big SQL query that does a bunch of related things at once. You could just nest a ton of subqueries inside each other, but the result will look like spaghetti code. A real SQL maze where even the author can get lost.
CTEs are your lifeline! CTEs let you break a complex query into logical chunks, each one set up as its own named section. This makes your query clear and maintainable.
Comparison: Subquery vs CTE
At first glance, both approaches do the same thing — filter grades by course and calculate the average grade for each student. But look closer: in the subquery version, the logic is "hidden" inside parentheses, while with a CTE it's pulled out and given a clear name, filtered_grades. Now imagine there are not two, but ten intermediate steps!
Subquery:
SELECT student_id, AVG(grade) AS avg_grade
FROM (
SELECT student_id, grade
FROM grades
WHERE course_id = 101
) subquery
GROUP BY student_id;
CTE:
WITH filtered_grades AS (
SELECT student_id, grade
FROM grades
WHERE course_id = 101
)
SELECT student_id, AVG(grade) AS avg_grade
FROM filtered_grades
GROUP BY student_id;
Spot the 10 differences. Of course, CTEs win hands down for readability!
Breaking Down Complex Queries into Steps with CTEs
CTEs let you build a query step by step so that at each stage, the result is as clear as possible. For example, if you want to get a list of students with their average grades per course and add info about their teachers, break the task into several parts.
Example:
WITH avg_grades AS (
SELECT student_id, course_id, AVG(grade) AS avg_grade
FROM grades
GROUP BY student_id, course_id
),
course_teachers AS (
SELECT course_id, teacher_id
FROM courses
)
SELECT ag.student_id, ag.avg_grade, ct.teacher_id
FROM avg_grades ag
JOIN course_teachers ct ON ag.course_id = ct.course_id;
Isn't that readable? Even if you come back to this query a month later, its structure will still be obvious.
Using Multiple CTEs for a Big Report
Let's look at an example of a more complex report. Say we have a university database and want to create a report on the most successful students, their courses, and their teachers. The plan is:
- First, find students with a high average grade (above 90).
- Then match them with their courses.
- Finally, add info about their teachers.
Query with multiple CTEs:
WITH high_achievers AS (
SELECT student_id, AVG(grade) AS avg_grade
FROM grades
GROUP BY student_id
HAVING AVG(grade) > 90
),
student_courses AS (
SELECT e.student_id, c.course_name, c.teacher_id
FROM enrollments e
JOIN courses c ON e.course_id = c.course_id
),
teachers AS (
SELECT teacher_id, name AS teacher_name
FROM teachers
)
SELECT ha.student_id, ha.avg_grade, sc.course_name, t.teacher_name
FROM high_achievers ha
JOIN student_courses sc ON ha.student_id = sc.student_id
JOIN teachers t ON sc.teacher_id = t.teacher_id;
This query is great because each part of the task is set up as its own logically complete block. Want to know which students have achieved impressive results? Check out the high_achievers CTE. Interested in their connection to courses? That's student_courses. Need the teachers? It's all in teachers. This approach makes maintaining and tweaking your code way easier.
Breaking Down Complex Calculations into Steps
Sometimes your queries include complicated calculations or filtering. Instead of trying to cram everything into one long query, break it up into several CTEs.
Example:
WITH course_stats AS (
SELECT course_id, COUNT(student_id) AS student_count, AVG(grade) AS avg_grade
FROM grades
GROUP BY course_id
),
popular_courses AS (
SELECT course_id
FROM course_stats
WHERE student_count > 50
)
SELECT c.course_name, cs.student_count, cs.avg_grade
FROM popular_courses pc
JOIN course_stats cs ON pc.course_id = cs.course_id
JOIN courses c ON c.course_id = pc.course_id;
Here, we first gather course stats in course_stats, then filter for popular courses in popular_courses, and only after that join it all with the courses table. This method lets you highlight the intermediate steps, making the query way easier to understand.
When Are CTEs a Must-Have?
Here are a few scenarios where CTEs are especially handy:
- Analytics and reporting. For example, calculating complex metrics with group filtering.
- Working with hierarchical structures. Recursive CTEs for building category trees or org charts.
- Reusing data. For example, if you need the same selection at different stages of your query.
Common Mistakes When Using CTEs
Of course, like any powerful tool, CTEs have their hidden pitfalls.
Excessive data materialization. In PostgreSQL, CTEs are "materialized" by default, meaning their result is calculated and temporarily stored. This can slow things down if the data is huge. To avoid this, use indexes and try to select only the columns you really need.
Wrong joins. Sometimes, complex queries with multiple CTEs get hard to optimize. Always check your queries with EXPLAIN or EXPLAIN ANALYZE.
Overusing CTEs. If your CTEs are getting too long and tangled, it might mean you should split your queries into several separate operations.
GO TO FULL VERSION