CodeGym /Courses /SQL SELF /Common Mistakes When Using JOIN

Common Mistakes When Using JOIN

SQL SELF
Level 12 , Lesson 4
Available

Now it's time to talk about real life. About that side of it you just can't avoid: mistakes. Catching, fixing, and understanding mistakes is a must when working with data. Let's break down what kind of rakes you can step on when working with JOIN in SQL, and how to dodge them.

Mistake 1: Skipping the Join Condition — Creating a Cartesian Product

The most common mistake is forgetting to specify the join condition using ON. In this case, you get a Cartesian product, where every row from the first table is joined with every row from the second. You end up with a massive number of rows that make no sense and just confuse you.

Here's an example. Let's say we have these tables:

Students (students):

student_id name
1 Otto
2 Anna

Courses (courses):

course_id course_name
101 Mathematics
102 History

Now let's write a query and forget about ON:

SELECT *
FROM students
JOIN courses;

Result:

student_id name course_id course_name
1 Otto 101 Mathematics
1 Otto 102 History
2 Anna 101 Mathematics
2 Anna 102 History

Doesn't look right, does it? This nightmare is called a Cartesian product.

How to fix it: use ON to specify how the data in the tables are related.

SELECT *
FROM students
JOIN courses
ON students.student_id = courses.course_id;

And now a new chapter of mistakes begins...

Foolproofing

This is such a common problem that PostgreSQL actually forbids using JOIN without specifying ON and a condition.

If you really want to join every row with every row, you can use the syntax without JOIN:

SELECT *
FROM students, courses;

Another option, option 3 — when JOIN without ON actually works:

  • With NATURAL JOIN — it automatically matches columns with the same name.
  • With USING — you specify the list of columns to join on.
  • CROSS JOIN — always without a condition, it's just a Cartesian product.

Mistake 2: Wrong Join Condition

Sometimes you do specify a join condition, but you do it wrong. For example, you join tables not by keys, but by unrelated data.

Let's say we want to get a list of students and the courses they're enrolled in, but we mess up and join the tables on unrelated fields:

SELECT *
FROM students
JOIN courses
ON students.student_id = courses.course_id;

This query will give you a bogus result, because student_id and course_id are totally different things.

How to fix it: make sure you're using the right columns for the join. The correct join might look like this (if you have an enrollments table linking students and courses):

SELECT students.name, courses.course_name
FROM students
JOIN enrollments ON students.student_id = enrollments.student_id
JOIN courses ON enrollments.course_id = courses.course_id;

Mistake 3: Duplicate Rows in the Result

When you add multiple JOINs to a query, sometimes you end up with duplicate rows. This happens if the JOIN tables have duplicate records, or you set up the join conditions wrong.

For example, student Otto is enrolled twice in the same course in the enrollments table.

Records in enrollments:

student_id course_id
1 101
1 101

Now a query with JOIN will give you this result:

SELECT students.name, courses.course_name
FROM students
JOIN enrollments ON students.student_id = enrollments.student_id
JOIN courses ON enrollments.course_id = courses.course_id;

Result:

name course_name
Otto Mathematics
Otto Mathematics

How to fix it: first, make sure your tables don't have duplicate data. Second, if this is expected behavior, remove duplicates using DISTINCT:

SELECT DISTINCT students.name, courses.course_name
FROM students
JOIN enrollments ON students.student_id = enrollments.student_id
JOIN courses ON enrollments.course_id = courses.course_id;

Mistake 4: Losing Rows When Using INNER JOIN

INNER JOIN only returns rows that match in both tables. If one table doesn't have a matching value, the row gets dropped. You can lose data if you pick the wrong type of join.

Let's say we have a student who isn't enrolled in any course yet:

Students (students):

student_id name
1 Otto
2 Anna
3 Dhany

Enrollments (enrollments):

student_id course_id
1 101
2 102

Now a query with INNER JOIN:

SELECT students.name, courses.course_name
FROM students
JOIN enrollments ON students.student_id = enrollments.student_id
JOIN courses ON enrollments.course_id = courses.course_id;

Result:

name course_name
Otto Mathematics
Anna History

But where's Dhany? If you want to include students without courses, you need to use LEFT JOIN:

SELECT students.name, courses.course_name
FROM students
LEFT JOIN enrollments ON students.student_id = enrollments.student_id
LEFT JOIN courses ON enrollments.course_id = courses.course_id;

Mistake 5: Mishandling NULL Values

If one of your tables has rows with empty (NULL) values, they might get left out of the results (for example, when using filter conditions).

Example: you use LEFT JOIN, but then add a WHERE to filter.

SELECT students.name, courses.course_name
FROM students
LEFT JOIN enrollments ON students.student_id = enrollments.student_id
LEFT JOIN courses ON enrollments.course_id = courses.course_id
WHERE courses.course_name = 'Mathematics';

Now students without courses won't be included in the result, even though you used LEFT JOIN.

How to fix it: if you want to include rows with missing courses, replace WHERE with ON or add an extra condition:

SELECT students.name, courses.course_name
FROM students
LEFT JOIN enrollments ON students.student_id = enrollments.student_id
LEFT JOIN courses ON enrollments.course_id = courses.course_id
WHERE courses.course_name IS NULL OR courses.course_name = 'Mathematics';

Mistake 6: Mixing Up Join Types

You get confused about which type of join to use. For example, you use RIGHT JOIN when you could just use LEFT JOIN and swap the table order.

How to avoid confusion:

  • Use LEFT JOIN if you can. It's more intuitive.
  • Switch the table order to avoid needing RIGHT JOIN.
2
Task
SQL SELF, level 12, lesson 4
Locked
Fixing the Cartesian Product
Fixing the Cartesian Product
2
Task
SQL SELF, level 12, lesson 4
Locked
Avoiding Data Loss with `LEFT JOIN`
Avoiding Data Loss with `LEFT JOIN`
1
Survey/quiz
Multiple JOINs, level 12, lesson 4
Unavailable
Multiple JOINs
Multiple JOINs
Comments
TO VIEW ALL COMMENTS OR TO MAKE A COMMENT,
GO TO FULL VERSION