现在是时候深入聊聊用这些函数时经常踩到的坑了。就算是很有经验的 SQL 大佬,有时候也会被这些坑绊倒。我们的目标就是学会发现这些坑,然后优雅地绕过去。
你有没有写过查询,结果报了个很迷的错,比如 "column must appear in the GROUP BY clause or be used in an aggregate function"?或者查询结果看着怪怪的,完全不知道为啥?这其实只是用聚合函数时常见错误的冰山一角。这节课就是你的 SQL 生存指南,帮你在错误和误解的海洋里顺利航行。
错误 1:在 GROUP BY 外用非聚合列
问题
你写了个返回聚合数据的查询,但顺手加了个既没在分组里、也没包在聚合函数里的列。比如:
SELECT department, salary, SUM(salary)
FROM employees
GROUP BY department;
PostgreSQL 马上就会告诉你:
ERROR: column "employees.salary" must appear in the GROUP BY clause or be used in an aggregate function
为啥会这样?
当你用 GROUP BY 时,PostgreSQL 会按指定的列把行分组。但你又加了个新列(这里是 salary),PostgreSQL 就懵了:你到底是想要一条工资,还是平均值,还是别的啥?
怎么解决?有两种方式:
- 确保所有非聚合列都在
GROUP BY里:
SELECT department, salary
FROM employees
GROUP BY department, salary;
- 或者,如果有意义,就把列包进聚合函数:
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department;
小建议:如果 PostgreSQL 因为 GROUP BY 报错,问问自己:“这个列真的需要吗?它在查询里到底起啥作用?”
错误 2:COUNT() 和 NULL 用错了
问题:你想统计有多少员工填了 bonus,就写了:
SELECT COUNT(bonus) AS bonus_count
FROM employees;
但发现结果比你预期的少。为啥?因为 COUNT(column) 会忽略 column 是 NULL 的行。
解决办法:如果你想统计所有行,就用 COUNT(*):
SELECT COUNT(*) AS total_count
FROM employees;
或者明确只要 bonus 不为 NULL 的行:
SELECT COUNT(bonus) AS bonus_count
FROM employees
WHERE bonus IS NOT NULL;
小提示:如果你想区分包含 NULL 的记录和表里完全没有记录,记得选对 COUNT(*) 还是 COUNT(column)。
错误 3:忘了用 HAVING 过滤聚合结果,反而用了 WHERE
问题:你想找平均工资大于 5000 的部门。新手可能会写成这样:
SELECT department, AVG(salary) AS avg_salary
FROM employees
WHERE AVG(salary) > 5000
GROUP BY department;
PostgreSQL 会报错:
ERROR: aggregate functions are not allowed in WHERE clause
为啥?因为 WHERE 过滤是在分组 之前,而聚合函数是在分组之后才算的。所以 AVG(salary) 在 WHERE 阶段还没出来。
正确做法是用 HAVING 过滤聚合后的数据:
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department
HAVING AVG(salary) > 5000;
错误 4:WHERE 过滤和执行顺序搞混了
问题:你想知道每个部门里年龄大于 30 岁的员工数量。查询可能写成这样:
SELECT department, COUNT(*)
FROM employees
GROUP BY department
WHERE age > 30;
PostgreSQL 又会报错:
ERROR: syntax error at or near "WHERE"
为啥?WHERE 总是在 GROUP BY 之前处理。你只是把 WHERE 放错地方了。
避免这种情况的方法是:先过滤,再分组。
SELECT department, COUNT(*)
FROM employees
WHERE age > 30
GROUP BY department;
错误 5:SUM()、AVG() 等函数遇到 NULL 的坑
问题:你想统计发给员工的总 bonus,就写了:
SELECT SUM(bonus) AS total_bonus
FROM employees;
但结果看着特别低。因为有一半员工没填 bonus,这些 NULL 都被忽略了。
解决办法:提前处理 NULL。比如可以把 NULL 换成 0:
SELECT SUM(COALESCE(bonus, 0)) AS total_bonus
FROM employees;
这样所有 NULL 都会变成 0,结果就对了。
我们会在后面的课里详细讲 COALESCE 的用法。
错误 6:多个聚合函数一起用却没搞清它们的关系
问题:你想统计员工总数和工资总额。但写的查询结果很奇怪:
SELECT COUNT(salary) AS count_salary, SUM(salary) AS total_salary
FROM employees;
为啥会这样?如果有人工资是 NULL,COUNT(salary) 和 SUM(salary) 的结果就不一样了,很容易搞混。
记住:聚合函数是各算各的。如果有 NULL,结果就会不一样。用 COALESCE 或 COUNT(*),保证一致性:
SELECT COUNT(*) AS total_employees, SUM(COALESCE(salary, 0)) AS total_salary
FROM employees;
错误 7:分组太多导致查询巨慢
问题:你写了个分组特别多的查询,结果跑了五小时还没跑完:
SELECT department, job_title, location, COUNT(*)
FROM employees
GROUP BY department, job_title, location;
在分组前先想想,真的每个字段都要放进 GROUP BY 吗?分组唯一值越多,查询越慢。如果可以,尽量减少分组字段:
SELECT department, COUNT(*)
FROM employees
GROUP BY department;
这些坑很常见,就算是老 SQL 程序员也会遇到。希望现在你能更轻松地避开这些暗礁,写出又快又准又优雅的查询!
GO TO FULL VERSION