When you write an SQL query, PostgreSQL doesn't just jump in and run it right away. First, it fires up its "brain" — the query optimizer, which builds an execution plan. This plan is like a route on a map: PostgreSQL figures out what steps to take and in what order to get your data as efficiently as possible.
The query optimizer checks out all possible ways to run your query: sequential table scans, using indexes, doing filtering and sorting, and so on. It tries to find the cheapest way (in terms of resources) to run your query. Basically, it's looking for a sweet spot between execution time and server resources.
Key execution plan parameters
Alright, let's get to the real "meat" — breaking down the parameters PostgreSQL shows you when you run EXPLAIN. Let's start with a simple example:
EXPLAIN
SELECT * FROM students WHERE age > 20;
You’ll get something like this:
Seq Scan on students (cost=0.00..35.00 rows=7 width=72)
Filter: (age > 20)
Let’s break down these mysterious words and numbers.
1. cost (execution cost)
cost — this is an estimate of how many resources it’ll take to run the query. This parameter has two parts:
- Startup Cost: the cost to get the operation started (like prepping an index).
- Total Cost: the total cost to finish the whole operation.
Example:
cost=0.00..35.00
0.00— that’s the Startup Cost.35.00— that’s the Total Cost.
The lower the cost values, the more PostgreSQL likes this plan. But keep in mind, cost is a relative thing. It’s not measured in seconds or milliseconds, it’s more like an internal PostgreSQL estimate.
2. rows (estimated number of rows)
rows shows how many rows PostgreSQL expects to return or process at this step of the query. In our example:
rows=7
This means PostgreSQL thinks the filter age > 20 will return 7 rows. This data comes from stats PostgreSQL collects about the table. If the stats are outdated, the guess might be off. That can lead to a less-than-ideal plan.
3. width (row width in bytes)
width — this is the average size of each row returned at this step, measured in bytes. In our example:
width=72
This means each returned row is about 72 bytes on average. width takes into account the size of the columns and any extra overhead, like row IDs or service info.
It’s kinda like loading an app. If its "weight" (like width) is big, it’ll take longer to load, even if you have fast internet (like cost).
Example: breaking down an execution plan
Let’s check out a real example. Say we have a table called students:
CREATE TABLE students (
id SERIAL PRIMARY KEY,
name VARCHAR(100),
age INTEGER,
major VARCHAR(50)
);
And we run this query:
EXPLAIN
SELECT * FROM students WHERE age > 20 AND major = 'CS';
The result might look like this:
Seq Scan on students (cost=0.00..42.50 rows=3 width=164)
Filter: ((age > 20) AND (major = 'CS'))
- Seq Scan: PostgreSQL is doing a sequential scan of the
studentstable. That means it’s going through every row. - cost=0.00..42.50: The cost of the operation.
Startup Costis0.00, and total cost is42.50. - rows=3: PostgreSQL expects the filter
age > 20 AND major = 'CS'to return 3 rows. - width=164: Each row is about 164 bytes on average.
Now you get how PostgreSQL makes decisions, and you can spot weak spots in your queries. For example, if you see a high cost, that might mean your query is kinda heavy. Or, if you see a big number in rows, maybe you should rethink your filter.
How does cost work in practice?
Let’s add an index on the age column:
CREATE INDEX idx_age ON students(age);
Now let’s run our query again:
EXPLAIN
SELECT * FROM students WHERE age > 20 AND major = 'CS';
The result might change:
Bitmap Heap Scan on students (cost=4.37..20.50 rows=3 width=164)
Recheck Cond: (age > 20)
Filter: (major = 'CS')
-> Bitmap Index Scan on idx_age (cost=0.00..4.37 rows=20 width=0)
Index Cond: (age > 20)
What changed?
- Instead of
Seq Scanwe now haveBitmap Heap Scan: PostgreSQL first finds matching rows in theidx_ageindex, then fetches them from the table. costdropped a lot: nowStartup Costis4.37, andTotal Costis20.50.- The operation got way more efficient thanks to the index.
Visualization: Seq Scan vs Index Scan
Here’s a little comparison table to make it clearer:
| Operation | Description | Example |
|---|---|---|
| Seq Scan | Reads the whole table | Full scan of all rows |
| Index Scan | Uses an index | Fast row lookup via index |
Gotchas and common mistakes
When you use execution plan parameters, be ready for some surprises. For example, a low cost doesn’t always mean better performance. If your database stats are stale (like after a big table update), the plan might not be accurate. Refresh your stats with the ANALYZE command. More on that in the next lecture.
Make sure you use indexes where you need them. But don’t go overboard: indexes take up space and slow down writes.
GO TO FULL VERSION