Welcome to the heart of those dramatic bulk data loading scenarios! Today, we're gonna learn how to handle errors that pop up during data loads using the ON CONFLICT clause. It's like flipping on autopilot in a plane: even if something goes sideways, you'll know what to do to avoid a crash. Let's dive into the PostgreSQL tricks!
No one likes surprises, especially when your data just refuses to load! When you're bulk loading, you might run into a few classic problems:
- Duplicate data. For example, if your table has a
UNIQUEconstraint and your data file is full of repeats. - Constraint conflicts. Like, try loading a blank value into a column with a
NOT NULLconstraint. Result? Error. PostgreSQL is always strict about this stuff. - Duplicate primary info. The table might already have rows with the same IDs as your CSV file.
Let's break down how to dodge these "gotchas" with ON CONFLICT.
Using ON CONFLICT to Handle Errors
The ON CONFLICT clause lets you tell PostgreSQL what to do if there's a constraint conflict (like UNIQUE or PRIMARY KEY). You can either update the existing row or just ignore the conflicting one.
Here's what the basic ON CONFLICT syntax looks like:
INSERT INTO table_name (column1, column2, ...)
VALUES (value1, value2, ...)
ON CONFLICT (conflict_target)
DO UPDATE SET column1 = new_value1, column2 = new_value2;
You can swap DO UPDATE for DO NOTHING if you just want to ignore the conflict.
Example: Updating Data on Conflict
Let's say we have a students table:
CREATE TABLE students (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
age INT
);
Now we want to load new data, but some of it is already in the database:
INSERT INTO students (id, name, age)
VALUES
(1, 'Peter', 22), -- This student already exists
(2, 'Anna', 20), -- New student
(3, 'Mal', 25) -- New student
ON CONFLICT (id) DO UPDATE SET
name = EXCLUDED.name,
age = EXCLUDED.age;
In this example, if there's already a student with the ID we're trying to add, their data will be updated:
ON CONFLICT (id) DO UPDATE SET
name = EXCLUDED.name,
age = EXCLUDED.age;
Check out the magic word EXCLUDED. It means "the values you tried to insert, but they got excluded because of the conflict."
Result:
- The student with
id = 1will have their info (name and age) updated. - The students with
id = 2andid = 3will be added to the table.
Example: Ignoring Conflicts
If you don't want to update data and just want to skip rows that cause conflicts, use DO NOTHING:
INSERT INTO students (id, name, age)
VALUES
(1, 'Peter', 22), -- This student already exists
(2, 'Anna', 20), -- New student
(3, 'Mal', 25) -- New student
ON CONFLICT (id) DO NOTHING;
Now the conflicting rows just won't be inserted, and the rest will happily settle into your database.
Error Logging
Sometimes ignoring or updating isn't enough. Maybe you need to log conflicts for later analysis. We can make a special table for logging errors:
CREATE TABLE conflict_log (
conflict_time TIMESTAMP DEFAULT NOW(),
id INT,
name TEXT,
age INT,
conflict_reason TEXT
);
Next, let's add error handling with logging:
INSERT INTO students (id, name, age)
VALUES
(1, 'Peter', 22),
(2, 'Anna', 20),
(3, 'Mal', 25)
ON CONFLICT (id) DO UPDATE SET
name = EXCLUDED.name,
age = EXCLUDED.age
RETURNING EXCLUDED.id, EXCLUDED.name, EXCLUDED.age
INTO conflict_log;
This last example only works inside stored procedures. You'll see exactly how this works when we get to PL-SQL. I'm jumping ahead a bit, just wanted to show another way to handle data load conflicts: logging all the problematic rows.
Now you can analyze why conflicts happened. This technique is super useful in complex systems where you want to keep a "trail" during bulk data loads.
Practical Example
Let's put all our knowledge together in one simple task. Imagine you have a CSV file with student updates you want to load into the table:
File students_update.csv
| id | name | age |
|---|---|---|
| 1 | Otto | 23 |
| 2 | Anna | 21 |
| 4 | Wally | 30 |
Loading Data and Handling Conflicts
- First, create a temp table
tmp_students:
CREATE TEMP TABLE tmp_students (
id INTEGER,
name TEXT,
age INTEGER
);
- Load data from the file using
\COPY:
\COPY tmp_students FROM 'students_update.csv' DELIMITER ',' CSV HEADER
- Insert data from the temp table into the main one using
INSERT ON CONFLICT:
INSERT INTO students (id, name, age)
SELECT id, name, age FROM tmp_students
ON CONFLICT (id) DO UPDATE
SET name = EXCLUDED.name,
age = EXCLUDED.age;
Now all the data, including updates (the row with id = 1), is loaded successfully.
Common Mistakes and How to Avoid Them
Mistakes happen even to the most experienced devs, but knowing how to dodge them can save you hours (maybe even days) of stress.
- Conflict with
UNIQUEconstraint. Make sure you specify the right field inON CONFLICT. For example, if you use the wrong key (idinstead ofemail), PostgreSQL will just say "bye-bye" to your query. - Misusing
EXCLUDED. This alias only refers to values passed in the current query. Don't try to use it in other contexts. - Missing columns. Make sure all columns you list in
SETactually exist in the table. For example, if you addSET non_existing_column = 'value', you'll get an error.
Using ON CONFLICT makes bulk data loading in PostgreSQL flexible and safe. You can not only avoid failed queries due to conflicts, but also control exactly how your data is handled. Your users (and your servers!) will thank you.
GO TO FULL VERSION