CodeGym /Courses /SQL SELF /Setting Delimiters and Data Formats When Importing

Setting Delimiters and Data Formats When Importing

SQL SELF
Level 24 , Lesson 1
Available

When data comes from different systems, its format can vary. One file might use commas as delimiters, another might use semicolons, and a third could use tabs as the main way to separate columns. If you set up delimiters wrong when importing data, you can run into errors or misinterpret the data.

On top of that, in real life you might have to import data in a text format without standard CSV headers, and also handle empty values and cases where blank lines should be treated as NULL. So, setting up delimiters and formats is a must-have skill when working with bulk data imports.

Main Delimiters: How Do You Work With Them?

In PostgreSQL, the COPY command gives you a lot of flexibility for setting up delimiters. Let’s break down how it works.

Setting Delimiters

By default, the COPY command expects columns in a CSV file to be separated by commas. But that’s not always convenient or possible: someone might use semicolons (;), pipes (|), or even tabs (\t).

Here’s how you can set the delimiter using the DELIMITER parameter:

COPY students FROM '/path/to/students.csv'
DELIMITER ',' 
CSV HEADER;

If you have semicolons instead of commas:

COPY students FROM '/path/to/students.csv'
DELIMITER ';'
CSV HEADER;

You can even import a file with tabs:

COPY students FROM '/path/to/students.tsv'
DELIMITER E'\t' 
CSV HEADER;

Here, E'\t' means the delimiter is a tab character.

Importing Files With a Non-Standard Delimiter

Practical example: you have a file with course data, where pipes (|) are used as delimiters. The data in the file looks like this:

course_id|course_name|credits
1|SQL Basics|3
2|Advanced SQL|4
3|PostgreSQL Masterclass|5

Here’s how you can import this data into the courses table:

COPY courses(course_id, course_name, credits)
FROM '/path/to/courses.csv' 
DELIMITER '|' 
CSV HEADER;

Here we’re telling PostgreSQL directly that the delimiter is a pipe.

Setting Data Formats When Importing

Delimiters are just part of the job. The data format in the file also matters a lot. Let’s look at the main ways to set up data formats.

Ignoring Blank Lines and Setting NULL

Big data files often have blank lines or columns with no data. PostgreSQL treats them as empty strings unless you say otherwise. To treat those values as NULL, you can use the NULL AS parameter:

Example. Let’s say your file has records with empty values:

id,first_name,last_name,email
1,John,Doe,
2,Jane,Smith,jane.smith@example.com
3,,Brown,james.brown@example.com

Importing data and treating empty values in the email column as NULL:

COPY students(id, first_name, last_name, email)
FROM '/path/to/students.csv'
DELIMITER ',' 
CSV HEADER 
NULL AS '';

As a result, empty values in the file will be saved as NULL in the table.

Ignoring Blank Lines

Sometimes a file might have blank lines you don’t want to import. PostgreSQL can ignore those lines automatically.

Example. A file with a blank line:

id,first_name,last_name,email
1,John,Doe,john.doe@example.com

2,Jane,Smith,jane.smith@example.com

Use the IGNORE_BLANK_LINES parameter:

COPY students(id, first_name, last_name, email)
FROM '/path/to/students.csv'
DELIMITER ',' 
CSV HEADER 
NULL AS '' 
IGNORE_BLANK_LINES;

Now blank lines will be ignored during import.

Working With a Non-Standard Data Format

Sometimes you might need to import data in a text format instead of CSV. For example, lines in the file are separated by pipes |, and there’s no header row.

Example file:

1|John|Doe|john.doe@example.com
2|Jane|Smith|jane.smith@example.com

In this case, you can use the following query:

COPY students(id, first_name, last_name, email)
FROM '/path/to/students.txt' 
DELIMITER '|' 
NULL AS '' 
CSV;

If there are no headers in the file, just remove the HEADER parameter.

Practical Example of Setting Data Formats

Scenario: you have a file called grades.tsv with student grades. The data looks like this:

student_id  course_id   grade
1   101     85
1   102     90
2   101     78
2   102     88
3   101     95
3   102 

You need to:

  1. Import the file and make sure empty values are treated as NULL.
  2. Make sure the data is loaded correctly.

Solution:

  1. Create the grades table:
CREATE TABLE grades (
    student_id INTEGER NOT NULL,
    course_id INTEGER NOT NULL,
    grade INTEGER
);
  1. Import the data from the file:
COPY grades(student_id, course_id, grade)
FROM '/path/to/grades.tsv' 
DELIMITER E'\t' 
NULL AS '' 
CSV HEADER;
  1. Check the imported data:
SELECT * FROM grades;

Expected result:

student_id course_id grade
1 101 85
1 102 90
2 101 78
2 102 88
3 101 95
3 102 NULL

Tips and Common Mistakes

Mistake: wrong delimiter. If you don’t set the right delimiter, PostgreSQL will throw an error or import the data incorrectly. For example, if your file uses semicolons but you forget to set DELIMITER ';', the COPY command will treat the whole line as a single column.

Mistake: wrong NULL interpretation. If you don’t set the NULL AS '' parameter, empty values in the file might be treated as empty strings, which often causes errors in calculations or filtering.

Mistake: wrong data format. Setting the wrong delimiter or having mistakes in the file (like using tabs instead of spaces) can lead to errors like: ERROR: null value in column violates not-null constraint.

2
Task
SQL SELF, level 24, lesson 1
Locked
Setting a custom delimiter when loading data
Setting a custom delimiter when loading data
Comments
TO VIEW ALL COMMENTS OR TO MAKE A COMMENT,
GO TO FULL VERSION