CodeGym /Courses /SQL SELF /Execution Plan Interpretation: Reading and Analyzing Node...

Execution Plan Interpretation: Reading and Analyzing Nodes (`Seq Scan`, `Index Scan`, `Hash Join`)

SQL SELF
Level 41 , Lesson 4
Available

Today we're gonna figure out what PostgreSQL execution plan nodes are, how to read them, and most importantly, how to tell when something's off. You'll see why your database sometimes goes for the resource-heavy Seq Scan even when you think you already have an index—or maybe even more than one.

When PostgreSQL builds a query execution plan, it breaks it down into steps called nodes. Each node is basically a "step" the database server takes to process your query. The main node types include:

Sequential Scan (Seq Scan)

Seq Scan or sequential scan is the simplest way to pull data from a table. PostgreSQL literally grabs the table and reads its rows one by one, checking if they match your query conditions.

When does Seq Scan get used?
Seq Scan kicks in if:

  • There's no suitable index in the table to speed up the query.
  • The filter condition is too broad for the index to be useful (like, you're pulling more than 50% of the data).
  • PostgreSQL thinks reading the table sequentially is faster than using an index (sometimes true for really tiny tables).
EXPLAIN SELECT * FROM students WHERE age > 18;

Sample result:

Seq Scan on students  (cost=0.00..35.50 rows=10 width=50)
  Filter: (age > 18)

Check out Seq Scan on students — that's PostgreSQL telling you it's gonna read the whole "students" table.

Seq Scan issues: If the table is huge, sequential scan can take a really long time.

Index Scan

Index Scan is scanning data using an index. When you create an index in PostgreSQL, it's kinda like making a "table of contents" for your table. If the query can use the index, PostgreSQL doesn't have to read the whole table—just the parts it needs.

When does Index Scan get used?

  • The query has filter conditions on a column with an index (like WHERE).
  • You're using comparison operators like =, <, >, BETWEEN, etc.
CREATE INDEX idx_students_age ON students(age);

EXPLAIN SELECT * FROM students WHERE age = 18;

Sample result:

Index Scan using idx_students_age on students  (cost=0.15..8.27 rows=1 width=50)
  Index Cond: (age = 18)

Here, Index Scan using idx_students_age shows PostgreSQL is using the idx_students_age index. Instead of reading the table row by row, it gets much faster access through the index.

Index Scan advantages:

  • Major speedup for queries on big tables.
  • Less data read from disk.

Index Scan issues:
If your query returns too much data (like, more than half the table), using the index can actually be slower than a Seq Scan.

Hash Join

Hash Join is used to join two tables based on a join condition (like ON students.course_id = courses.id). PostgreSQL builds a hash table for one of the tables (the smaller one) and uses it to look up matches in the other table.

When does Hash Join get used?

  • When joining tables with INNER JOIN, LEFT JOIN, etc.
  • When PostgreSQL thinks Hash Join is more efficient than other join methods.
EXPLAIN
SELECT * 
FROM students 
JOIN courses ON students.course_id = courses.id;

Sample result:

Hash Join  (cost=25.00..50.00 rows=10 width=100)
  Hash Cond: (students.course_id = courses.id)
  -> Seq Scan on students  (cost=0.00..20.00 rows=10 width=50)
  -> Hash  (cost=15.00..15.00 rows=10 width=50)
       -> Seq Scan on courses  (cost=0.00..15.00 rows=10 width=50)

Here, Hash Join is joining two tables. Notice PostgreSQL first does a Seq Scan for both tables, then builds a hash table (Hash).

Hash Join advantages:

  • Fast for medium-sized tables.
  • Efficient for joining tables with lots of rows.

Hash Join issues:
If the hash table is bigger than available memory, PostgreSQL will use disk to store it, which slows the join way down.

Sample Execution Plan Analysis

Let's break down a real example.

Query:

EXPLAIN ANALYZE
SELECT *
FROM students
JOIN courses ON students.course_id = courses.id
WHERE students.age > 18;

Result:

Hash Join  (cost=35.00..75.00 rows=5 width=100) (actual time=1.00..2.50 rows=5 loops=1)
  Hash Cond: (students.course_id = courses.id)
  -> Seq Scan on students  (cost=0.00..40.00 rows=10 width=50) (actual time=0.50..1.00 rows=7 loops=1)
        Filter: (age > 18)
        Rows Removed by Filter: 3
  -> Hash  (cost=25.00..25.00 rows=5 width=50) (actual time=0.30..0.30 rows=5 loops=1)
       -> Seq Scan on courses  (cost=0.00..20.00 rows=5 width=50) (actual time=0.20..0.25 rows=5 loops=1)
Planning Time: 0.50 ms
Execution Time: 3.00 ms

Interpretation:

  1. Hash Join: The main node. PostgreSQL is joining students and courses.
    • actual time: from 1.00 to 2.50 ms.
    • rows=5: the query returned 5 rows.
  2. Nested nodes:
    • Seq Scan on students: reads the students table sequentially and applies the filter (age > 18).
    • Rows Removed by Filter = 3: 3 rows didn't match the condition.
    • Hash: PostgreSQL builds a hash table for the courses table.

Comparing and Choosing Nodes

When you're analyzing an execution plan, the key is to understand why PostgreSQL picked a certain data processing method. Sometimes you gotta step in and fix things, like adding an index or rewriting the query. Here are a few tips:

  • If you see a Seq Scan on a big table, think about adding indexes.
  • If Hash Join is too slow, check how much memory PostgreSQL has available.
  • Use EXPLAIN ANALYZE to compare estimated and actual metrics (rows, time).

At this point, you already have a basic understanding of how to read query execution plans and interpret their nodes. In the next lectures, we'll talk about common optimization problems and how to solve them.

2
Task
SQL SELF, level 41, lesson 4
Locked
Using an Index and the `Index Scan` Node
Using an Index and the `Index Scan` Node
1
Survey/quiz
Query Execution Plan, level 41, lesson 4
Unavailable
Query Execution Plan
Query Execution Plan
Comments
TO VIEW ALL COMMENTS OR TO MAKE A COMMENT,
GO TO FULL VERSION