1. Preparing Data for Saving
In previous lectures, we already learned how to fetch data from web pages using the BeautifulSoup library. Now it's time to turn our scraping achievements into something useful and convenient for analysis. After all, if the data stays only in your script, it's not very helpful. Today we'll learn how to save data in CSV and Excel formats because, as they say, in a programmer's life, just like in a cat's life, the main thing is to find something to eat... I mean, save the data.
Before we start turning our data into fun tables, let's take a moment to consider how to prepare our data for export. Usually, data that we extract from web pages is presented as lists or dictionaries. It's a good idea to format them so they're ready for export into a table format.
Sample Data
Let's say you've collected the following data about books:
books = [
{"title": "Harry Potter and the Philosopher's Stone", "author": "J. K. Rowling", "price": "350.00"},
{"title": "The Lord of the Rings", "author": "J. R. R. Tolkien", "price": "500.00"},
{"title": "The Master and Margarita", "author": "M. A. Bulgakov", "price": "450.00"}
]
2. Saving to CSV Format
CSV (Comma-Separated Values) is a simple text format that allows storing table-like data. Each row in a CSV file represents a record, and each column is separated by commas. Let's learn how to save our data in this format.
Using the csv Library
Python provides us with the handy csv module, which makes working with CSV files a lot easier. Here's how you can export our book data to a CSV file:
import csv
with open('books.csv', mode='w', newline='', encoding='utf-8') as file:
writer = csv.DictWriter(file, fieldnames=["title", "author", "price"])
writer.writeheader()
writer.writerows(books)
print("Data successfully saved to books.csv")
Note: Always specify newline='' when opening a file in w mode to avoid extra blank lines in Windows.
Reading Data from CSV
For completeness, let's also cover how to read data back from a CSV file:
with open('books.csv', mode='r', encoding='utf-8') as file:
reader = csv.DictReader(file)
for row in reader:
print(row)
3. Saving to Excel Format
To work with Excel in Python, we'll use the pandas library. Pandas lets you easily convert your data into a DataFrame and export it to various formats, including Excel.
Installing pandas
If you haven't installed pandas yet, you can do it using pip:
pip install pandas openpyxl
Using pandas to Export to Excel
Now let's save our book data to an Excel file:
import pandas as pd
df = pd.DataFrame(books)
df.to_excel('books.xlsx', index=False, engine='openpyxl')
print("Data successfully saved to books.xlsx")
Reading Data from Excel
Isn't it nice not only to create but also to reuse what's been created! Let's learn how to read data from Excel:
df = pd.read_excel('books.xlsx')
print(df)
4. Errors and Their Handling
Just like any other aspect of programming, exporting data can come with its own set of errors. The most common ones are related to incorrect file paths or missing necessary libraries. For instance, if you forget to install openpyxl, exporting to Excel will fail.
Checking for File Existence
Before trying to write or read a file, it's helpful to check if it exists. This can be done using the os module:
import os
if os.path.exists('books.csv'):
print("File books.csv exists")
else:
print("File books.csv not found")
You won't believe it, but the most common mistake beginners make is accidentally overwriting old data with new data. Automating file history saving is an art in itself.
Also, don't forget to include error handling using try-except constructs so that your script doesn't crash at the first problem.
GO TO FULL VERSION