Imagine you’re working with thousands of rows of data — how do you make sense of it all? That’s where the GROUP BY operator steps in — the real conductor in the SQL orchestra. It takes scattered data and turns it into a clear melody: counts, groups, summarizes. Want to know how many orders each client made, how many students are in each course, or how salaries are distributed by department? That’s all GROUP BY’s job. Today we’ll figure out how to befriend it and get the most out of your tables.
Grouping is the process of combining rows that have the same values in one or more columns into logical groups. It lets you apply aggregate functions to each group separately.
Imagine you have an employees table, and you want to know the average salary for each department. One department — one group. SQL uses GROUP BY to split the employees table into groups by department, then applies AVG() to each group.
Syntax of GROUP BY
The main rule for using grouping in SQL: if you use GROUP BY, every column that’s not part of an aggregate function must be listed in GROUP BY.
Syntax:
SELECT column1,
aggregate_function(column2)
FROM table
GROUP BY column1;
Steps:
- Specify the column you want to group by in
GROUP BY. - Use aggregate functions to calculate values within groups.
- All columns in
SELECTthat aren’t inside aggregate functions must be listed inGROUP BY. Yep, SQL is strict about this, and if you forget, it’ll remind you with an error.
Example: Grouping Students by Faculty
Let’s say we have a students table that stores info about students:
| id | name | faculty | gpa |
|---|---|---|---|
| 1 | Alex Lin | ComputerSci | 3.8 |
| 2 | Maria Chi | Math | 3.5 |
| 3 | Anna Song | ComputerSci | 4.0 |
| 4 | Otto Art | Math | 3.9 |
| 5 | Liam Park | Physics | 3.7 |
Now we want to know the average GPA for each faculty. Here’s a query with GROUP BY:
SELECT faculty, AVG(gpa) AS avg_gpa
FROM students
GROUP BY faculty;
Result:
| faculty | avg_gpa |
|---|---|
| ComputerSci | 3.9 |
| Math | 3.7 |
| Physics | 3.7 |
SQL first split the data into groups by the faculty column, then applied the AVG() function to each group.
Quirks of Working with GROUP BY
- Requirements for columns in
SELECT
SQL requires that every column you put in SELECT but don’t use in aggregate functions (like SUM(), COUNT()) must be mentioned in GROUP BY. That’s because without grouping, SQL can’t figure out what values to show.
Try running this query and you’ll get an error:
SELECT name, AVG(gpa)
FROM students
GROUP BY faculty;
Error: column name isn’t listed in GROUP BY. To fix it, add name to GROUP BY:
SELECT name, AVG(gpa)
FROM students
GROUP BY faculty, name;
But now you’re grouping by individual students — not quite what we wanted at first.
- Grouping by multiple columns
You can group data not just by one column, but by several. For example, maybe you want to group students by both faculty and their names. Just add a second column to GROUP BY:
SELECT faculty, name, AVG(gpa) AS avg_gpa
FROM students
GROUP BY faculty, name;
Original table:
| id | name | faculty | gpa |
|---|---|---|---|
| 1 | Alex Lin | ComputerSci | 3.8 |
| 2 | Maria Chi | Math | 3.5 |
| 3 | Anna Song | ComputerSci | 4.0 |
| 4 | Otto Art | Math | 3.9 |
| 5 | Liam Park | Physics | 3.7 |
Result:
| faculty | name | avg_gpa |
|---|---|---|
| ComputerSci | Alex Lin | 3.8 |
| ComputerSci | Anna Song | 4.0 |
| Math | Maria Chi | 3.5 |
| Math | Otto Art | 3.9 |
| Physics | Liam Park | 3.7 |
- Grouping with multiple aggregate functions
Don’t limit yourself to just one function! For example, let’s count the number of students in each faculty and calculate the average GPA:
SELECT faculty,
COUNT(*) AS student_count,
AVG(gpa) AS avg_gpa
FROM students
GROUP BY faculty;
Source table:
| id | name | faculty | gpa |
|---|---|---|---|
| 1 | Alex Lin | ComputerSci | 3.8 |
| 2 | Maria Chi | Math | 3.5 |
| 3 | Anna Song | ComputerSci | 4.0 |
| 4 | Otto Art | Math | 3.9 |
| 5 | Liam Park | Physics | 3.7 |
Result:
| faculty | student_count | avg_gpa |
|---|---|---|
| ComputerSci | 2 | 3.9 |
| Math | 2 | 3.7 |
| Physics | 1 | 3.7 |
Grouping in SQL: What You Can and Can’t Select
Writing queries with grouping is easy, but honestly, half your queries probably won’t work at first. Grouping works a bit differently than how we usually think about it in our heads.
If your SQL query has GROUP BY, think of all result columns as calculated expressions. Columns in SELECT can only be two types:
- calculated by aggregate functions based on group columns.
- taken from GROUP BY — you must group by them.
If you run a GROUP-BY query on the students table, you can’t have individual students in the result! You can have average height, average weight, average GPA. This code won’t work:
SELECT faculty, name
FROM students
GROUP BY faculty;
Let’s try to figure out why.
The GROUP BY faculty operator will split students in the students table into groups with the same faculty for each group. Since all students in a group have the same faculty, you can say the group has a faculty attribute. But name is different for every student. So the group doesn’t have a single name attribute.
The GROUP BY faculty, gender operator will split students in the students table into groups with the same faculty and gender for each group. So all students in a group will have the same faculty and gender. You can say the group has faculty and gender attributes. But there’s still no single name for the group.
This is allowed:
SELECT faculty, gender
FROM students
GROUP BY faculty, gender;
You can even do this:
SELECT
faculty,
gender,
AVG(age) as group_avg_age, -- Value is calculated based on age values in the student group
MAX(high) as group_high -- Value is calculated based on high values in the student group
FROM students
GROUP BY faculty, gender;
But you can’t just use age and high in SELECT in our case.
Common Mistakes When Using GROUP BY
When you start writing queries with GROUP BY, here are a few gotchas you might run into:
Not all columns are listed in
SELECT. Remember, every column that’s not an aggregate must be listed inGROUP BY. Otherwise SQL won’t know how to display it.Grouping by
NULL.NULLvalues are treated as a separate group. If your column hasNULL, SQL will create a group forNULL.Too many groups. If you accidentally add too many columns to
GROUP BY, you might get results that are too detailed and harder to analyze.
Now you know how to effectively group data using GROUP BY. It’s one of the most powerful tools in SQL, letting you easily work with aggregated data and build structured reports. Next, we’ll keep diving into the magic of grouping and learn how to set up extra filters using HAVING.
GO TO FULL VERSION