CodeGym /Courses /SQL SELF /Query Optimization with Window Functions

Query Optimization with Window Functions

SQL SELF
Level 30 , Lesson 3
Available

Query Optimization with Window Functions

There’s one more important detail we haven’t talked about yet—performance of queries with window functions. Even the most elegant query can turn into a slowpoke turtle if you don’t think about optimization. That’s exactly what we’re gonna tackle today!

Window functions are insanely flexible and powerful. But that flexibility is both a gift and a potential performance threat. PostgreSQL, sadly, doesn’t work by “magic”—it needs resources to process your data. And if you throw window functions at a huge table, your query might end up running a marathon in place.

Optimization will let you:

  • Speed up queries working with large amounts of data.
  • Minimize the load on your database.
  • Make your queries more server-friendly (and colleague-friendly, if they’re working with the DB too!).

Let’s dive in and figure out what you can do to make your queries fly like a racecar at a car show.

Window Function Basics

Before we start optimizing, it’s good to understand what exactly slows your query down. PostgreSQL works with window functions like this:

  1. Sorts the data if there’s an ORDER BY inside OVER() in your query.
  2. Processes each row within the specified window frame or group.
  3. Returns a result for every row.

Now imagine you’ve got a sales table with 10 million rows. If your query doesn’t have filters, PostgreSQL will process every single one of those rows. That’s not just a marathon—it’s a never-ending treadmill.

How to Speed Up Window Functions?

  1. Using Indexes to Speed Up Sorting

Most window functions use ORDER BY inside OVER() to control row order. That means PostgreSQL has to sort your data before running the window function.

If you have an index on the column(s) used in ORDER BY, PostgreSQL can speed up that sort a lot.

Example

CREATE INDEX idx_sales_date ON sales (sale_date);

Now, if you write a query that sorts by sale_date, the index comes into play:

SELECT
    sale_date,
    product_id,
    SUM(amount) OVER (PARTITION BY product_id ORDER BY sale_date) AS running_total
FROM sales;

Without an index on sale_date, every query run will include an expensive sort—PostgreSQL will be scrambling to figure out how to order those rows faster.

  1. Applying Filters with WHERE

Narrowing down your data is a key optimization trick. If you don’t need to process all 10 million rows, but just the last year—cut down the data range with WHERE!

Example

SELECT
    sale_date,
    product_id,
    SUM(amount) OVER (PARTITION BY product_id ORDER BY sale_date) AS running_total
FROM sales
WHERE sale_date >= '2023-01-01';

It’s like filtering dirty water through a sieve to keep only the good stuff.

  1. Choosing the Right Window Frame

When you’re working with window functions that aggregate, like SUM(), it’s important to pick the right window frame. If you use the default frame (RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW), PostgreSQL will include all rows up to the current one. That can be super inefficient for big tables.

Example: using ROWS

If you only need a couple rows before the current one, it’s better to spell that out with ROWS:

SELECT
    sale_date,
    product_id,
    SUM(amount) OVER (
        PARTITION BY product_id 
        ORDER BY sale_date 
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) AS rolling_sum
FROM sales;

Here, PostgreSQL only processes three rows (two previous + current) for each row. That’s way more efficient than processing hundreds by default.

  1. Minimizing the Number of Window Functions

Each window function is processed by PostgreSQL individually. If you use several window functions, PostgreSQL might sort for each one separately, which slows things down. But if the window parameters (like PARTITION BY and ORDER BY) match, PostgreSQL can handle it more efficiently.

Example: optimizing with the same window

SELECT
    product_id,
    sale_date,
    SUM(amount) OVER (PARTITION BY product_id ORDER BY sale_date) AS running_total,
    ROW_NUMBER() OVER (PARTITION BY product_id ORDER BY sale_date) AS row_num
FROM sales;

Both functions (SUM() and ROW_NUMBER()) use the same frame. PostgreSQL will only sort once—and that’s awesome.

  1. Table Partitioning

If your table is too big, it’s worth thinking about physically splitting it into smaller pieces. PostgreSQL lets you create partitioned tables so your data ends up in different segments. That can really speed things up.

Example of creating a partitioned table

CREATE TABLE sales_partitioned (
    sale_date DATE NOT NULL,
    product_id INT NOT NULL,
    amount NUMERIC NOT NULL
) PARTITION BY RANGE (sale_date);

After that, you create a set of partitions, like by year:

CREATE TABLE sales_2022 PARTITION OF sales_partitioned
FOR VALUES FROM ('2022-01-01') TO ('2022-12-31');

CREATE TABLE sales_2023 PARTITION OF sales_partitioned
FOR VALUES FROM ('2023-01-01') TO ('2023-12-31');

Now, if you use WHERE sale_date BETWEEN '2023-01-01' AND '2023-12-31', PostgreSQL will automatically hit just the right partition.

You’ll learn more about table partitioning closer to the end of the course :P

  1. Avoid Extra Data (SELECT only what you need)

Pick only the columns you need for your function and your result. If your window function only needs product_id, sale_date, and amount, don’t drag along the whole zoo of client bio-data columns.

Example of a “lean” query

SELECT
    product_id,
    sale_date,
    SUM(amount) OVER (PARTITION BY product_id ORDER BY sale_date) AS running_total
FROM sales;

Less data—less work for PostgreSQL.

  1. Using Materialization (MATERIALIZED VIEW)

If you often run the same calculations with window functions, you can save the results in a materialized view. A Materialized View stores the data on disk so you don’t have to rerun heavy queries.

Example of creating a materialized view

CREATE MATERIALIZED VIEW sales_running_total AS
SELECT 
    product_id,
    sale_date,
    SUM(amount) OVER (PARTITION BY product_id ORDER BY sale_date) AS running_total
FROM sales;

Now you can just query the data:

SELECT * FROM sales_running_total WHERE product_id = 10;
  1. Query Planning with EXPLAIN and EXPLAIN ANALYZE

Like with other parts of SQL, you can use EXPLAIN or EXPLAIN ANALYZE to see how PostgreSQL runs your query—and where the bottlenecks are hiding.

Example of query analysis

EXPLAIN ANALYZE
SELECT 
    product_id,
    sale_date,
    SUM(amount) OVER (PARTITION BY product_id ORDER BY sale_date) AS running_total
FROM sales;

This tool will show you where PostgreSQL spends the most time, so you can optimize the slow spots.

Window functions are a powerful data analysis tool, but you gotta be careful with them. Want speed? Pick indexes, add filters, don’t forget about partitions, and don’t be shy about using materialized views. PostgreSQL loves it when you use it thoughtfully!

2
Task
SQL SELF, level 30, lesson 3
Locked
Optimization Using Filters
Optimization Using Filters
Comments
TO VIEW ALL COMMENTS OR TO MAKE A COMMENT,
GO TO FULL VERSION