Now you’re ready to dive into the world of practical examples and see how all this stuff works in real-life tasks!
Example: Calculating Sales Rank by Region
Let’s say we have a sales table with sales data from different regions. We need to figure out the sales ranks for each region.
| id | region | sales_amount |
|---|---|---|
| 1 | North | 5000 |
| 2 | North | 3000 |
| 3 | North | 7000 |
| 4 | South | 2000 |
| 5 | South | 4000 |
| 6 | East | 8000 |
| 7 | East | 6000 |
Task: find the RANK of sales for each region
SELECT
region,
sales_amount,
RANK() OVER (PARTITION BY region ORDER BY sales_amount DESC) AS sales_rank
FROM
sales;
Result:
| region | sales_amount | sales_rank |
|---|---|---|
| North | 7000 | 1 |
| North | 5000 | 2 |
| North | 3000 | 3 |
| South | 4000 | 1 |
| South | 2000 | 2 |
| East | 8000 | 1 |
| East | 6000 | 2 |
Notice that we used PARTITION BY region to calculate the ranks separately for each region. If we didn’t use PARTITION BY, the rank would be calculated globally for the whole table.
Example: Calculating Cumulative Revenue Sum
Now let’s work with the transactions table to calculate the cumulative (running) revenue sum for each customer.
| id | customer_id | purchase_date | amount |
|---|---|---|---|
| 1 | 101 | 2023-01-01 | 100 |
| 2 | 101 | 2023-01-03 | 50 |
| 3 | 102 | 2023-01-02 | 200 |
| 4 | 101 | 2023-01-05 | 150 |
| 5 | 102 | 2023-01-04 | 100 |
Task: calculate the cumulative sum for each customer
SELECT
customer_id,
purchase_date,
amount,
SUM(amount) OVER (PARTITION BY customer_id ORDER BY purchase_date) AS cumulative_sum
FROM
transactions;
Result:
| customer_id | purchase_date | amount | cumulative_sum |
|---|---|---|---|
| 101 | 2023-01-01 | 100 | 100 |
| 101 | 2023-01-03 | 50 | 150 |
| 101 | 2023-01-05 | 150 | 300 |
| 102 | 2023-01-02 | 200 | 200 |
| 102 | 2023-01-04 | 100 | 300 |
The key thing here is using ORDER BY purchase_date inside OVER() so the cumulative sum is calculated in chronological order.
Example: Splitting Data into Quantiles
Imagine we have a students table with names and test scores. We want to split the students into 4 groups based on their scores.
| id | name | test_score |
|---|---|---|
| 1 | Alice | 85 |
| 2 | Bob | 95 |
| 3 | Charlie | 75 |
| 4 | Diana | 88 |
| 5 | Edward | 65 |
| 6 | Fiona | 70 |
Task: split students into 4 groups using NTILE()
SELECT
name,
test_score,
NTILE(4) OVER (ORDER BY test_score DESC) AS quartile
FROM
students;
The query result will look like this:
| name | test_score | quartile |
|---|---|---|
| Bob | 95 | 1 |
| Diana | 88 | 1 |
| Alice | 85 | 2 |
| Charlie | 75 | 3 |
| Fiona | 70 | 3 |
| Edward | 65 | 4 |
NTILE(4) splits the data into 4 groups. Students with the highest scores end up in the first group, and those with the lowest — in the last one.
Example: Time Series Analysis
The site_visits table stores data about the number of site visits per day. We need to calculate the difference in visits between days for each site.
| site_id | visit_date | visits |
|---|---|---|
| 1 | 2023-01-01 | 100 |
| 1 | 2023-01-02 | 120 |
| 1 | 2023-01-03 | 110 |
| 2 | 2023-01-01 | 50 |
| 2 | 2023-01-02 | 60 |
| 2 | 2023-01-03 | 70 |
Our task — calculate the difference in visits between days
SELECT
site_id,
visit_date,
visits,
visits - LAG(visits) OVER (PARTITION BY site_id ORDER BY visit_date) AS visit_diff
FROM
site_visits;
Result:
| site_id | visit_date | visits | visit_diff |
|---|---|---|---|
| 1 | 2023-01-01 | 100 | NULL |
| 1 | 2023-01-02 | 120 | 20 |
| 1 | 2023-01-03 | 110 | -10 |
| 2 | 2023-01-01 | 50 | NULL |
| 2 | 2023-01-02 | 60 | 10 |
| 2 | 2023-01-03 | 70 | 10 |
The LAG() function lets you grab the value from the previous row. If there’s no previous row, the result will be NULL. You’ll learn more about it in the next lectures :P
GO TO FULL VERSION