CodeGym /Courses /SQL SELF /Examples of Using Window Functions for Data Analysis

Examples of Using Window Functions for Data Analysis

SQL SELF
Level 30 , Lesson 1
Available

Now you’re ready to dive into the world of practical examples and see how all this stuff works in real-life tasks!

Example: Calculating Sales Rank by Region

Let’s say we have a sales table with sales data from different regions. We need to figure out the sales ranks for each region.

id region sales_amount
1 North 5000
2 North 3000
3 North 7000
4 South 2000
5 South 4000
6 East 8000
7 East 6000

Task: find the RANK of sales for each region

SELECT
    region,
    sales_amount,
    RANK() OVER (PARTITION BY region ORDER BY sales_amount DESC) AS sales_rank
FROM 
    sales;

Result:

region sales_amount sales_rank
North 7000 1
North 5000 2
North 3000 3
South 4000 1
South 2000 2
East 8000 1
East 6000 2

Notice that we used PARTITION BY region to calculate the ranks separately for each region. If we didn’t use PARTITION BY, the rank would be calculated globally for the whole table.

Example: Calculating Cumulative Revenue Sum

Now let’s work with the transactions table to calculate the cumulative (running) revenue sum for each customer.

id customer_id purchase_date amount
1 101 2023-01-01 100
2 101 2023-01-03 50
3 102 2023-01-02 200
4 101 2023-01-05 150
5 102 2023-01-04 100

Task: calculate the cumulative sum for each customer

SELECT
    customer_id,
    purchase_date,
    amount,
    SUM(amount) OVER (PARTITION BY customer_id ORDER BY purchase_date) AS cumulative_sum
FROM 
    transactions;

Result:

customer_id purchase_date amount cumulative_sum
101 2023-01-01 100 100
101 2023-01-03 50 150
101 2023-01-05 150 300
102 2023-01-02 200 200
102 2023-01-04 100 300

The key thing here is using ORDER BY purchase_date inside OVER() so the cumulative sum is calculated in chronological order.

Example: Splitting Data into Quantiles

Imagine we have a students table with names and test scores. We want to split the students into 4 groups based on their scores.

id name test_score
1 Alice 85
2 Bob 95
3 Charlie 75
4 Diana 88
5 Edward 65
6 Fiona 70

Task: split students into 4 groups using NTILE()

SELECT
    name,
    test_score,
    NTILE(4) OVER (ORDER BY test_score DESC) AS quartile
FROM 
    students;

The query result will look like this:

name test_score quartile
Bob 95 1
Diana 88 1
Alice 85 2
Charlie 75 3
Fiona 70 3
Edward 65 4

NTILE(4) splits the data into 4 groups. Students with the highest scores end up in the first group, and those with the lowest — in the last one.

Example: Time Series Analysis

The site_visits table stores data about the number of site visits per day. We need to calculate the difference in visits between days for each site.

site_id visit_date visits
1 2023-01-01 100
1 2023-01-02 120
1 2023-01-03 110
2 2023-01-01 50
2 2023-01-02 60
2 2023-01-03 70

Our task — calculate the difference in visits between days

SELECT
    site_id,
    visit_date,
    visits,
    visits - LAG(visits) OVER (PARTITION BY site_id ORDER BY visit_date) AS visit_diff
FROM 
    site_visits;

Result:

site_id visit_date visits visit_diff
1 2023-01-01 100 NULL
1 2023-01-02 120 20
1 2023-01-03 110 -10
2 2023-01-01 50 NULL
2 2023-01-02 60 10
2 2023-01-03 70 10

The LAG() function lets you grab the value from the previous row. If there’s no previous row, the result will be NULL. You’ll learn more about it in the next lectures :P

2
Task
SQL SELF, level 30, lesson 1
Locked
Sales Rank Calculation
Sales Rank Calculation
Comments
TO VIEW ALL COMMENTS OR TO MAKE A COMMENT,
GO TO FULL VERSION