现在你已经准备好 dive in 实战例子,看看这些东西在真实任务里是怎么用的啦!
例子:按地区计算销售排名
假设我们有个 sales 表,里面存着不同地区的销售数据。我们要为每个地区算出销售排名。
| id | region | sales_amount |
|---|---|---|
| 1 | North | 5000 |
| 2 | North | 3000 |
| 3 | North | 7000 |
| 4 | South | 2000 |
| 5 | South | 4000 |
| 6 | East | 8000 |
| 7 | East | 6000 |
任务:找出每个地区的销售排名(RANK)
SELECT
region,
sales_amount,
RANK() OVER (PARTITION BY region ORDER BY sales_amount DESC) AS sales_rank
FROM
sales;
结果:
| region | sales_amount | sales_rank |
|---|---|---|
| North | 7000 | 1 |
| North | 5000 | 2 |
| North | 3000 | 3 |
| South | 4000 | 1 |
| South | 2000 | 2 |
| East | 8000 | 1 |
| East | 6000 | 2 |
注意我们用了 PARTITION BY region,这样排名是每个地区单独算的。如果不加 PARTITION BY,那排名就是全表一起算啦。
例子:计算累计收入
现在我们来处理 transactions 表,给每个客户算累计(累加)收入。
| id | customer_id | purchase_date | amount |
|---|---|---|---|
| 1 | 101 | 2023-01-01 | 100 |
| 2 | 101 | 2023-01-03 | 50 |
| 3 | 102 | 2023-01-02 | 200 |
| 4 | 101 | 2023-01-05 | 150 |
| 5 | 102 | 2023-01-04 | 100 |
任务:给每个客户算累计和
SELECT
customer_id,
purchase_date,
amount,
SUM(amount) OVER (PARTITION BY customer_id ORDER BY purchase_date) AS cumulative_sum
FROM
transactions;
结果:
| customer_id | purchase_date | amount | cumulative_sum |
|---|---|---|---|
| 101 | 2023-01-01 | 100 | 100 |
| 101 | 2023-01-03 | 50 | 150 |
| 101 | 2023-01-05 | 150 | 300 |
| 102 | 2023-01-02 | 200 | 200 |
| 102 | 2023-01-04 | 100 | 300 |
这里的关键点是在 OVER() 里用了 ORDER BY purchase_date,这样累计和就是按时间顺序算的。
例子:分位数分组
假设我们有个 students 表,里面有名字和测试分数。我们想按分数把学生分成 4 组。
| id | name | test_score |
|---|---|---|
| 1 | Alice | 85 |
| 2 | Bob | 95 |
| 3 | Charlie | 75 |
| 4 | Diana | 88 |
| 5 | Edward | 65 |
| 6 | Fiona | 70 |
任务:用 NTILE() 把学生分成 4 组
SELECT
name,
test_score,
NTILE(4) OVER (ORDER BY test_score DESC) AS quartile
FROM
students;
查询结果如下:
| name | test_score | quartile |
|---|---|---|
| Bob | 95 | 1 |
| Diana | 88 | 1 |
| Alice | 85 | 2 |
| Charlie | 75 | 3 |
| Fiona | 70 | 3 |
| Edward | 65 | 4 |
NTILE(4) 把数据分成 4 组。分数最高的学生在第一组,最低的在最后一组。
例子:时间序列数据分析
site_visits 表里存着每天网站访问量。我们要算出每个网站每天访问量的变化。
| site_id | visit_date | visits |
|---|---|---|
| 1 | 2023-01-01 | 100 |
| 1 | 2023-01-02 | 120 |
| 1 | 2023-01-03 | 110 |
| 2 | 2023-01-01 | 50 |
| 2 | 2023-01-02 | 60 |
| 2 | 2023-01-03 | 70 |
我们的任务——算出每天访问量的变化
SELECT
site_id,
visit_date,
visits,
visits - LAG(visits) OVER (PARTITION BY site_id ORDER BY visit_date) AS visit_diff
FROM
site_visits;
结果:
| site_id | visit_date | visits | visit_diff |
|---|---|---|---|
| 1 | 2023-01-01 | 100 | NULL |
| 1 | 2023-01-02 | 120 | 20 |
| 1 | 2023-01-03 | 110 | -10 |
| 2 | 2023-01-01 | 50 | NULL |
| 2 | 2023-01-02 | 60 | 10 |
| 2 | 2023-01-03 | 70 | 10 |
LAG() 函数能拿到上一行的值。如果没有上一行,结果就是 NULL。后面几节课你会学到更多关于它的内容 :P
GO TO FULL VERSION