When you use window functions, you might wonder: "How many rows inside the window are used to calculate the value for the current row?" The answer depends on the window frame.
Window frame is the range of rows that's used to calculate the result of a window function. This range is built based on the current row and some extra conditions set with ROWS or RANGE.
Simple example: when calculating a running total, you can specify:
- Only consider the current row.
- Consider the current row and all rows above it.
- Consider the current row and a fixed number of rows above/below it.
It's ROWS and RANGE that control which rows get into the window frame.
Using ROWS
ROWS defines the window frame at the level of physical row positions. This means it counts rows from top to bottom in their order, no matter what values are in those rows.
Syntax
window_function OVER (
ORDER BY column
ROWS BETWEEN start AND end
)
Key expressions:
CURRENT ROW— the current row.number PRECEDING— a certain number of rows above the current one.number FOLLOWING— a certain number of rows below the current one.UNBOUNDED PRECEDING— from the start of the window.UNBOUNDED FOLLOWING— to the end of the window.
Example: running total for the current row and 2 previous rows
SELECT
employee_id,
salary,
SUM(salary) OVER (
ORDER BY employee_id
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS rolling_sum
FROM employees;
Explanation:
-
ROWS BETWEEN 2 PRECEDING AND CURRENT ROWmeans: take the current row and two rows above it. - The running total will be calculated only for these three rows.
Result:
| employee_id | salary | rolling_sum |
|---|---|---|
| 1 | 5000 | 5000 |
| 2 | 7000 | 12000 |
| 3 | 6000 | 18000 |
| 4 | 4000 | 17000 |
Example: "sliding window" analysis with a fixed number of rows
Task: calculate the average salary for the current row and the next two rows.
SELECT
employee_id,
salary,
AVG(salary) OVER (
ORDER BY employee_id
ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING
) AS rolling_avg
FROM employees;
Result:
| employee_id | salary | rolling_avg |
|---|---|---|
| 1 | 5000 | 6000 |
| 2 | 7000 | 5666.67 |
| 3 | 6000 | 5000 |
| 4 | 4000 | 4000 |
Using RANGE
RANGE builds the window frame based on values, not row positions. This means rows are included in the frame if their values in the ORDER BY column fall within the specified range.
Syntax
window_function OVER (
ORDER BY column
RANGE BETWEEN start AND end
)
Example: running total by value range
Task: calculate the running total for rows where the salary differs from the current one by no more than 2000.
SELECT
employee_id,
salary,
SUM(salary) OVER (
ORDER BY salary
RANGE BETWEEN 2000 PRECEDING AND 2000 FOLLOWING
) AS range_sum
FROM employees;
Explanation:
RANGE BETWEEN 2000 PRECEDING AND 2000 FOLLOWINGmeans: take rows where thesalaryvalue is within ±2000 of the current row.
Result:
| employee_id | salary | range_sum |
|---|---|---|
| 4 | 4000 | 10000 |
| 3 | 6000 | 17000 |
| 2 | 7000 | 17000 |
| 1 | 5000 | 17000 |
Comparing ROWS and RANGE
ROWSworks with actual rows and their count. It doesn't care about values.RANGEworks with a logical range of values set for theORDER BYcolumn.
For comparison, let's look at an example. Suppose we have a sales table with data:
| id | amount |
|---|---|
| 1 | 100 |
| 2 | 100 |
| 3 | 300 |
| 4 | 400 |
Let's compare the queries:
ROWS:
SELECT
id,
SUM(amount) OVER (
ORDER BY amount
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS sum_rows
FROM sales;
Result:
| id | sum_rows |
|---|---|
| 1 | 100 |
| 2 | 200 |
| 3 | 500 |
| 4 | 900 |
Here, each row gets added to the sum as it actually appears.
RANGE:
SELECT
id,
SUM(amount) OVER (
ORDER BY amount
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS sum_range
FROM sales;
Result:
| id | sum_range |
|---|---|
| 1 | 200 |
| 2 | 200 |
| 3 | 500 |
| 4 | 900 |
Here, rows 1 and 2 are grouped together since their amount = 100. RANGE takes into account duplicate values in the amount column.
Real-world task examples
- Calculating revenue growth
Task: calculate the change in revenue compared to the previous row.
SELECT
month,
revenue,
revenue - LAG(revenue) OVER (
ORDER BY month
) AS revenue_change
FROM sales_data;
- Comparing the current row to the group average
Task: for each department, calculate the difference between an employee's salary and the department's average salary.
SELECT
department_id,
employee_id,
salary,
salary - AVG(salary) OVER (
PARTITION BY department_id
) AS salary_diff
FROM employees;
Mistakes when using ROWS and RANGE
Incorrect row order (ORDER BY): If you don't specify a sort order, PostgreSQL will throw an error because it can't figure out the current row.
Mixing ROWS and RANGE approaches in one task: Pick your approach based on your data. ROWS is good for tasks with a fixed number of rows, while RANGE is for value ranges.
Skipping duplicate values in RANGE: Remember, RANGE includes all duplicate values, which can seriously change your result.
GO TO FULL VERSION