CodeGym /Courses /SQL SELF /Setting up a window frame with ROWS and

Setting up a window frame with ROWS and RANGE

SQL SELF
Level 30 , Lesson 2
Available

When you use window functions, you might wonder: "How many rows inside the window are used to calculate the value for the current row?" The answer depends on the window frame.

Window frame is the range of rows that's used to calculate the result of a window function. This range is built based on the current row and some extra conditions set with ROWS or RANGE.

Simple example: when calculating a running total, you can specify:

  • Only consider the current row.
  • Consider the current row and all rows above it.
  • Consider the current row and a fixed number of rows above/below it.

It's ROWS and RANGE that control which rows get into the window frame.

Using ROWS

ROWS defines the window frame at the level of physical row positions. This means it counts rows from top to bottom in their order, no matter what values are in those rows.

Syntax

window_function OVER (
    ORDER BY column
    ROWS BETWEEN start AND end
)

Key expressions:

  • CURRENT ROW — the current row.
  • number PRECEDING — a certain number of rows above the current one.
  • number FOLLOWING — a certain number of rows below the current one.
  • UNBOUNDED PRECEDING — from the start of the window.
  • UNBOUNDED FOLLOWING — to the end of the window.

Example: running total for the current row and 2 previous rows

SELECT
    employee_id,
    salary,
    SUM(salary) OVER (
        ORDER BY employee_id
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) AS rolling_sum
FROM employees;

Explanation:

  • ROWS BETWEEN 2 PRECEDING AND CURRENT ROW means: take the current row and two rows above it.
  • The running total will be calculated only for these three rows.

Result:

employee_id salary rolling_sum
1 5000 5000
2 7000 12000
3 6000 18000
4 4000 17000

Example: "sliding window" analysis with a fixed number of rows

Task: calculate the average salary for the current row and the next two rows.

SELECT 
    employee_id,
    salary,
    AVG(salary) OVER (
        ORDER BY employee_id
        ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING
    ) AS rolling_avg
FROM employees;

Result:

employee_id salary rolling_avg
1 5000 6000
2 7000 5666.67
3 6000 5000
4 4000 4000

Using RANGE

RANGE builds the window frame based on values, not row positions. This means rows are included in the frame if their values in the ORDER BY column fall within the specified range.

Syntax

window_function OVER (
    ORDER BY column
    RANGE BETWEEN start AND end
)

Example: running total by value range

Task: calculate the running total for rows where the salary differs from the current one by no more than 2000.

SELECT 
    employee_id,
    salary,
    SUM(salary) OVER (
        ORDER BY salary
        RANGE BETWEEN 2000 PRECEDING AND 2000 FOLLOWING
    ) AS range_sum
FROM employees;

Explanation:

  • RANGE BETWEEN 2000 PRECEDING AND 2000 FOLLOWING means: take rows where the salary value is within ±2000 of the current row.

Result:

employee_id salary range_sum
4 4000 10000
3 6000 17000
2 7000 17000
1 5000 17000

Comparing ROWS and RANGE

  • ROWS works with actual rows and their count. It doesn't care about values.
  • RANGE works with a logical range of values set for the ORDER BY column.

For comparison, let's look at an example. Suppose we have a sales table with data:

id amount
1 100
2 100
3 300
4 400

Let's compare the queries:

ROWS:

SELECT
    id,
    SUM(amount) OVER (
        ORDER BY amount
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS sum_rows
FROM sales;

Result:

id sum_rows
1 100
2 200
3 500
4 900

Here, each row gets added to the sum as it actually appears.

RANGE:

SELECT 
    id,
    SUM(amount) OVER (
        ORDER BY amount
        RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS sum_range
FROM sales;

Result:

id sum_range
1 200
2 200
3 500
4 900

Here, rows 1 and 2 are grouped together since their amount = 100. RANGE takes into account duplicate values in the amount column.

Real-world task examples

  1. Calculating revenue growth

Task: calculate the change in revenue compared to the previous row.

SELECT 
    month,
    revenue,
    revenue - LAG(revenue) OVER (
        ORDER BY month
    ) AS revenue_change
FROM sales_data;
  1. Comparing the current row to the group average

Task: for each department, calculate the difference between an employee's salary and the department's average salary.

SELECT 
    department_id,
    employee_id,
    salary,
    salary - AVG(salary) OVER (
        PARTITION BY department_id
    ) AS salary_diff
FROM employees;

Mistakes when using ROWS and RANGE

Incorrect row order (ORDER BY): If you don't specify a sort order, PostgreSQL will throw an error because it can't figure out the current row.

Mixing ROWS and RANGE approaches in one task: Pick your approach based on your data. ROWS is good for tasks with a fixed number of rows, while RANGE is for value ranges.

Skipping duplicate values in RANGE: Remember, RANGE includes all duplicate values, which can seriously change your result.

2
Task
SQL SELF, level 30, lesson 2
Locked
Cumulative sum for the current row and the two previous rows
Cumulative sum for the current row and the two previous rows
Comments
TO VIEW ALL COMMENTS OR TO MAKE A COMMENT,
GO TO FULL VERSION