Now it's time to talk about real life. About that side of it you just can't avoid: mistakes. Catching, fixing, and understanding mistakes is a must when working with data. Let's break down what kind of rakes you can step on when working with JOIN in SQL, and how to dodge them.
Mistake 1: Skipping the Join Condition — Creating a Cartesian Product
The most common mistake is forgetting to specify the join condition using ON. In this case, you get a Cartesian product, where every row from the first table is joined with every row from the second. You end up with a massive number of rows that make no sense and just confuse you.
Here's an example. Let's say we have these tables:
Students (students):
| student_id | name |
|---|---|
| 1 | Otto |
| 2 | Anna |
Courses (courses):
| course_id | course_name |
|---|---|
| 101 | Mathematics |
| 102 | History |
Now let's write a query and forget about ON:
SELECT *
FROM students
JOIN courses;
Result:
| student_id | name | course_id | course_name |
|---|---|---|---|
| 1 | Otto | 101 | Mathematics |
| 1 | Otto | 102 | History |
| 2 | Anna | 101 | Mathematics |
| 2 | Anna | 102 | History |
Doesn't look right, does it? This nightmare is called a Cartesian product.
How to fix it: use ON to specify how the data in the tables are related.
SELECT *
FROM students
JOIN courses
ON students.student_id = courses.course_id;
And now a new chapter of mistakes begins...
Foolproofing
This is such a common problem that PostgreSQL actually forbids using JOIN without specifying ON and a condition.
If you really want to join every row with every row, you can use the syntax without JOIN:
SELECT *
FROM students, courses;
Another option, option 3 — when JOIN without ON actually works:
- With
NATURAL JOIN— it automatically matches columns with the same name. - With
USING— you specify the list of columns to join on. CROSS JOIN— always without a condition, it's just a Cartesian product.
Mistake 2: Wrong Join Condition
Sometimes you do specify a join condition, but you do it wrong. For example, you join tables not by keys, but by unrelated data.
Let's say we want to get a list of students and the courses they're enrolled in, but we mess up and join the tables on unrelated fields:
SELECT *
FROM students
JOIN courses
ON students.student_id = courses.course_id;
This query will give you a bogus result, because student_id and course_id are totally different things.
How to fix it: make sure you're using the right columns for the join. The correct join might look like this (if you have an enrollments table linking students and courses):
SELECT students.name, courses.course_name
FROM students
JOIN enrollments ON students.student_id = enrollments.student_id
JOIN courses ON enrollments.course_id = courses.course_id;
Mistake 3: Duplicate Rows in the Result
When you add multiple JOINs to a query, sometimes you end up with duplicate rows. This happens if the JOIN tables have duplicate records, or you set up the join conditions wrong.
For example, student Otto is enrolled twice in the same course in the enrollments table.
Records in enrollments:
| student_id | course_id |
|---|---|
| 1 | 101 |
| 1 | 101 |
Now a query with JOIN will give you this result:
SELECT students.name, courses.course_name
FROM students
JOIN enrollments ON students.student_id = enrollments.student_id
JOIN courses ON enrollments.course_id = courses.course_id;
Result:
| name | course_name |
|---|---|
| Otto | Mathematics |
| Otto | Mathematics |
How to fix it: first, make sure your tables don't have duplicate data. Second, if this is expected behavior, remove duplicates using DISTINCT:
SELECT DISTINCT students.name, courses.course_name
FROM students
JOIN enrollments ON students.student_id = enrollments.student_id
JOIN courses ON enrollments.course_id = courses.course_id;
Mistake 4: Losing Rows When Using INNER JOIN
INNER JOIN only returns rows that match in both tables. If one table doesn't have a matching value, the row gets dropped. You can lose data if you pick the wrong type of join.
Let's say we have a student who isn't enrolled in any course yet:
Students (students):
| student_id | name |
|---|---|
| 1 | Otto |
| 2 | Anna |
| 3 | Dhany |
Enrollments (enrollments):
| student_id | course_id |
|---|---|
| 1 | 101 |
| 2 | 102 |
Now a query with INNER JOIN:
SELECT students.name, courses.course_name
FROM students
JOIN enrollments ON students.student_id = enrollments.student_id
JOIN courses ON enrollments.course_id = courses.course_id;
Result:
| name | course_name |
|---|---|
| Otto | Mathematics |
| Anna | History |
But where's Dhany? If you want to include students without courses, you need to use LEFT JOIN:
SELECT students.name, courses.course_name
FROM students
LEFT JOIN enrollments ON students.student_id = enrollments.student_id
LEFT JOIN courses ON enrollments.course_id = courses.course_id;
Mistake 5: Mishandling NULL Values
If one of your tables has rows with empty (NULL) values, they might get left out of the results (for example, when using filter conditions).
Example: you use LEFT JOIN, but then add a WHERE to filter.
SELECT students.name, courses.course_name
FROM students
LEFT JOIN enrollments ON students.student_id = enrollments.student_id
LEFT JOIN courses ON enrollments.course_id = courses.course_id
WHERE courses.course_name = 'Mathematics';
Now students without courses won't be included in the result, even though you used LEFT JOIN.
How to fix it: if you want to include rows with missing courses, replace WHERE with ON or add an extra condition:
SELECT students.name, courses.course_name
FROM students
LEFT JOIN enrollments ON students.student_id = enrollments.student_id
LEFT JOIN courses ON enrollments.course_id = courses.course_id
WHERE courses.course_name IS NULL OR courses.course_name = 'Mathematics';
Mistake 6: Mixing Up Join Types
You get confused about which type of join to use. For example, you use RIGHT JOIN when you could just use LEFT JOIN and swap the table order.
How to avoid confusion:
- Use
LEFT JOINif you can. It's more intuitive. - Switch the table order to avoid needing
RIGHT JOIN.
GO TO FULL VERSION