CodeGym /Courses /SQL SELF /Preparing Tables for Loading Data from CSV

Preparing Tables for Loading Data from CSV

SQL SELF
Level 23 , Lesson 2
Available

Alright, let’s finally get down to prepping tables for bulk data import from CSV files. If you’re thinking, “Why do I even need to prep a table? What’s so hard about it?”, you’ve got a lot to learn about the real world. No file is ever “perfect.” There’s always a catch—duplicates, extra spaces, data type errors, or just mismatched structure.

Let’s break down how to set up your database so your CSV file loads without drama.

Before you load a CSV file, figure out exactly how the data will be stored in your database. That means you need to create a table with the right structure first.

Example: Loading Student Data

Let’s say we have a CSV file called students.csv with info about students. Here’s what it looks like:

id,name,age,email,major
1,Alex,20,alex@example.com,Computer Science
2,Maria,21,maria@example.com,Mathematics
3,Otto,19,otto@example.com,Physics

Based on this, let’s create a table:

CREATE TABLE students (
    id SERIAL PRIMARY KEY,       -- Unique student identifier
    name VARCHAR(100) NOT NULL,  -- Student name (string up to 100 chars)
    age INT CHECK (age > 0),     -- Student age (must be greater than 0)
    email VARCHAR(100) UNIQUE,   -- Unique email
    major VARCHAR(100)           -- Main major
);
  • id SERIAL PRIMARY KEY: We added a primary key to uniquely identify rows. If your CSV already has unique IDs, use the id column from the file.
  • name VARCHAR(100) NOT NULL: Student name is required, and it’s limited to 100 characters.
  • age INT CHECK (age > 0): Age must be a number, and we add a check to make sure it’s always greater than 0.
  • email VARCHAR(100) UNIQUE: Email must be unique to avoid duplicate records.
  • major VARCHAR(100): Indicates the student’s main major. No restrictions here.

Your tables should be thoughtfully structured to match your data and protect against bad data. This will cut down on errors during import.

Checking Data Before Import

CSV files are full of surprises. Before you load anything, make sure your data matches the table structure.

How do you check your data?

  1. Column Match
    Make sure the number of columns in your CSV matches the number in your table. For example, if your table has 5 columns but your CSV has 6, the import will fail.

  2. Data Types
    Check that the data in each column matches the expected type. For example, the age column should only have integers.

Validation Tools

Excel or Google Sheets. Open the file in a spreadsheet editor and make sure there are no empty rows or cells with bad data.

Python. Use the pandas library to check data types:

import pandas as pd

# Read CSV
df = pd.read_csv('students.csv')

# Check data
print(df.dtypes)  # Print data types for each column
print(df.isnull().sum())  # Check for missing values

Cleaning Data Before Import

Most of the time, data from outside sources needs cleaning. Otherwise, you’ll run into errors during import.

Common CSV File Issues

Empty Rows or Columns
If a required column (NOT NULL) is missing a value, you’ll get an error.

Example of bad data:

id,name,age,email,major
1,Alex,20,alexey@example.com,Computer Science
2,Maria,,maria@example.com,Mathematics

Solution: Replace empty values with something valid. For example, replace an empty age with NULL.

Extra Spaces
Spaces at the start or end of strings can cause problems. For example, "Alex" and "Alex" will be treated as different values.

Python solution: remove extra spaces.

df = df.apply(lambda x: x.str.strip() if x.dtype == "object" else x)

Bad Characters or Encoding
If your file has special characters that don’t play nice with your database, the import might fail.

Example: Use the iconv app to convert encoding:

iconv -f WINDOWS-1251 -t UTF-8 students.csv > students_utf8.csv

Cleaning Data: Python Example

import pandas as pd

# Read file
df = pd.read_csv('students.csv')

# Clean data
df['name'] = df['name'].str.strip()  # Remove spaces
df['email'] = df['email'].str.lower()  # Make emails lowercase
df['age'] = df['age'].fillna(0)  # Fill empty ages with 0
df['age'] = df['age'].astype(int)  # Convert age to int

# Save changes to a new file
df.to_csv('cleaned_students.csv', index=False)

Always check your data before loading it. Remember: a good programmer saves their own time by catching problems early!

Handy Checklist for Table and Data Prep

Before you start working with CSVs, check:

  • The table structure matches your data (columns, data types, constraints).
  • Your CSV file doesn’t have empty rows, extra spaces, or bad characters.
  • The file encoding is compatible with PostgreSQL (UTF-8 is best).
  • You’re using tools to analyze and clean your data (like Python or Excel).

Now you’re ready to load data from CSV! But before we do that, make sure your table is set up right and your data is clean. In the next lecture, we’ll keep digging into the data loading process in PostgreSQL, including error handling and dealing with conflicts.

2
Task
SQL SELF, level 23, lesson 2
Locked
Creating a table for data import
Creating a table for data import
Comments
TO VIEW ALL COMMENTS OR TO MAKE A COMMENT,
GO TO FULL VERSION