CodeGym /Courses /SQL SELF /Handling Errors When Loading Data ( ON CONFLICT

Handling Errors When Loading Data ( ON CONFLICT)

SQL SELF
Level 23 , Lesson 3
Available

Welcome to the heart of those dramatic bulk data loading scenarios! Today, we're gonna learn how to handle errors that pop up during data loads using the ON CONFLICT clause. It's like flipping on autopilot in a plane: even if something goes sideways, you'll know what to do to avoid a crash. Let's dive into the PostgreSQL tricks!

No one likes surprises, especially when your data just refuses to load! When you're bulk loading, you might run into a few classic problems:

  • Duplicate data. For example, if your table has a UNIQUE constraint and your data file is full of repeats.
  • Constraint conflicts. Like, try loading a blank value into a column with a NOT NULL constraint. Result? Error. PostgreSQL is always strict about this stuff.
  • Duplicate primary info. The table might already have rows with the same IDs as your CSV file.

Let's break down how to dodge these "gotchas" with ON CONFLICT.

Using ON CONFLICT to Handle Errors

The ON CONFLICT clause lets you tell PostgreSQL what to do if there's a constraint conflict (like UNIQUE or PRIMARY KEY). You can either update the existing row or just ignore the conflicting one.

Here's what the basic ON CONFLICT syntax looks like:

INSERT INTO table_name (column1, column2, ...)
VALUES (value1, value2, ...)
ON CONFLICT (conflict_target)
DO UPDATE SET column1 = new_value1, column2 = new_value2;

You can swap DO UPDATE for DO NOTHING if you just want to ignore the conflict.

Example: Updating Data on Conflict

Let's say we have a students table:

CREATE TABLE students (
    id SERIAL PRIMARY KEY,
    name TEXT NOT NULL,
    age INT
);

Now we want to load new data, but some of it is already in the database:

INSERT INTO students (id, name, age)
VALUES 
    (1, 'Peter', 22),  -- This student already exists
    (2, 'Anna', 20),  -- New student
    (3, 'Mal', 25) -- New student
ON CONFLICT (id) DO UPDATE SET 
    name = EXCLUDED.name, 
    age = EXCLUDED.age;

In this example, if there's already a student with the ID we're trying to add, their data will be updated:

ON CONFLICT (id) DO UPDATE SET
    name = EXCLUDED.name, 
    age = EXCLUDED.age;

Check out the magic word EXCLUDED. It means "the values you tried to insert, but they got excluded because of the conflict."

Result:

  • The student with id = 1 will have their info (name and age) updated.
  • The students with id = 2 and id = 3 will be added to the table.

Example: Ignoring Conflicts

If you don't want to update data and just want to skip rows that cause conflicts, use DO NOTHING:

INSERT INTO students (id, name, age)
VALUES 
    (1, 'Peter', 22),  -- This student already exists
    (2, 'Anna', 20),  -- New student
    (3, 'Mal', 25) -- New student
ON CONFLICT (id) DO NOTHING;

Now the conflicting rows just won't be inserted, and the rest will happily settle into your database.

Error Logging

Sometimes ignoring or updating isn't enough. Maybe you need to log conflicts for later analysis. We can make a special table for logging errors:

CREATE TABLE conflict_log (
    conflict_time TIMESTAMP DEFAULT NOW(),
    id INT,
    name TEXT,
    age INT,
    conflict_reason TEXT
);

Next, let's add error handling with logging:

INSERT INTO students (id, name, age)
VALUES 
    (1, 'Peter', 22), 
    (2, 'Anna', 20), 
    (3, 'Mal', 25)
ON CONFLICT (id) DO UPDATE SET 
    name = EXCLUDED.name, 
    age = EXCLUDED.age
RETURNING EXCLUDED.id, EXCLUDED.name, EXCLUDED.age
INTO conflict_log;

This last example only works inside stored procedures. You'll see exactly how this works when we get to PL-SQL. I'm jumping ahead a bit, just wanted to show another way to handle data load conflicts: logging all the problematic rows.

Now you can analyze why conflicts happened. This technique is super useful in complex systems where you want to keep a "trail" during bulk data loads.

Practical Example

Let's put all our knowledge together in one simple task. Imagine you have a CSV file with student updates you want to load into the table:

File students_update.csv

id name age
1 Otto 23
2 Anna 21
4 Wally 30

Loading Data and Handling Conflicts

  1. First, create a temp table tmp_students:
CREATE TEMP TABLE tmp_students (
  id   INTEGER,
  name TEXT,
  age  INTEGER
);
  1. Load data from the file using \COPY:
\COPY tmp_students FROM 'students_update.csv' DELIMITER ',' CSV HEADER
  1. Insert data from the temp table into the main one using INSERT ON CONFLICT:
INSERT INTO students (id, name, age)
SELECT id, name, age FROM tmp_students
ON CONFLICT (id) DO UPDATE
  SET name = EXCLUDED.name,
      age = EXCLUDED.age;

Now all the data, including updates (the row with id = 1), is loaded successfully.

Common Mistakes and How to Avoid Them

Mistakes happen even to the most experienced devs, but knowing how to dodge them can save you hours (maybe even days) of stress.

  • Conflict with UNIQUE constraint. Make sure you specify the right field in ON CONFLICT. For example, if you use the wrong key (id instead of email), PostgreSQL will just say "bye-bye" to your query.
  • Misusing EXCLUDED. This alias only refers to values passed in the current query. Don't try to use it in other contexts.
  • Missing columns. Make sure all columns you list in SET actually exist in the table. For example, if you add SET non_existing_column = 'value', you'll get an error.

Using ON CONFLICT makes bulk data loading in PostgreSQL flexible and safe. You can not only avoid failed queries due to conflicts, but also control exactly how your data is handled. Your users (and your servers!) will thank you.

2
Task
SQL SELF, level 23, lesson 3
Locked
Updating Data on Conflict
Updating Data on Conflict
Comments
TO VIEW ALL COMMENTS OR TO MAKE A COMMENT,
GO TO FULL VERSION