CodeGym /Java Course /Python SELF EN /Splitting PDF Pages to Save Separate Parts

Splitting PDF Pages to Save Separate Parts

Python SELF EN
Level 43 , Lesson 4
Available

1. Splitting PDF into Individual Pages

Basics of Splitting PDF Files with PyPDF2

PyPDF2 provides the PdfReader class to open and read PDF files, and PdfWriter class for writing pages into a new PDF document. To split a document into separate pages, you use add_page() in PdfWriter to save specific pages into separate files.

Saving Each PDF Page as a Separate Document

This code splits a document, creating a separate PDF file for each page of the original file.

Python

import PyPDF2

# Open the PDF file
with open("sample.pdf", "rb") as pdf_file:
    pdf_reader = PyPDF2.PdfReader(pdf_file)
    
    # Iterate through each page
    for page_num in range(len(pdf_reader.pages)):
        pdf_writer = PyPDF2.PdfWriter()
        
        # Extract the page and add it to a new PDF
        page = pdf_reader.pages[page_num]
        pdf_writer.add_page(page)
        
        # Save the current page as a separate PDF file
        output_filename = f"page_{page_num + 1}.pdf"
        with open(output_filename, "wb") as output_file:
            pdf_writer.write(output_file)

print("Each page is saved as a separate file.")

In this example, each page of the original document is saved into separate files page_1.pdf, page_2.pdf, and so on.

2. Extracting a Range of Pages

Sometimes, you might need to save not every page but a specific range of pages, e.g., from 1 to 5. You can specify the pages in a loop for this.

Python

import PyPDF2

with open("sample.pdf", "rb") as pdf_file:
    pdf_reader = PyPDF2.PdfReader(pdf_file)
    pdf_writer = PyPDF2.PdfWriter()
    
    # Define the range of pages, e.g., pages 1 to 5
    start_page = 1
    end_page = 5
    
    for page_num in range(start_page - 1, end_page):  # Page numbering starts at 0
        page = pdf_reader.pages[page_num]
        pdf_writer.add_page(page)

    # Save the range of pages as a new PDF
    with open("pages_1_to_5.pdf", "wb") as output_file:
        pdf_writer.write(output_file)

print("Page range successfully saved.")

This code extracts pages from 1 to 5 and saves them into the file pages_1_to_5.pdf.

3. Saving Multiple Ranges

If you need to extract multiple ranges of pages and save them into separate files, you can use a function that takes the start and end of the range.

Python

import PyPDF2

def save_page_range(input_pdf, output_pdf, start_page, end_page):
    with open(input_pdf, "rb") as pdf_file:
        pdf_reader = PyPDF2.PdfReader(pdf_file)
        pdf_writer = PyPDF2.PdfWriter()
        
        for page_num in range(start_page - 1, end_page):
            page = pdf_reader.pages[page_num]
            pdf_writer.add_page(page)

        with open(output_pdf, "wb") as output_file:
            pdf_writer.write(output_file)
    
    print(f"Pages {start_page}–{end_page} saved to {output_pdf}")

# Using the function to extract different ranges
save_page_range("sample.pdf", "section_1.pdf", 1, 3)
save_page_range("sample.pdf", "section_2.pdf", 4, 6)

This code creates two files for pages 1 to 3 and 4 to 6, saved as section_1.pdf and section_2.pdf respectively.

4. Bulk Processing of PDF Documents

Automating PDF Splitting for All Files in a Folder

If you need to split every page of all PDF documents in a specified folder, you can create a script that processes each file automatically.

Python

import PyPDF2
import os

# Path to the folder containing PDF files
folder_path = "pdf_folder"

# Automatically split all PDF files in the folder
for filename in os.listdir(folder_path):
    if filename.endswith(".pdf"):
        file_path = os.path.join(folder_path, filename)
        with open(file_path, "rb") as pdf_file:
            pdf_reader = PyPDF2.PdfReader(pdf_file)
            
            # Split the file into individual pages
            for page_num in range(len(pdf_reader.pages)):
                pdf_writer = PyPDF2.PdfWriter()
                page = pdf_reader.pages[page_num]
                pdf_writer.add_page(page)
                
                # Generate the output file name
                output_filename = f"{filename[:-4]}_page_{page_num + 1}.pdf"
                output_path = os.path.join(folder_path, output_filename)
                
                # Save the page
                with open(output_path, "wb") as output_file:
                    pdf_writer.write(output_file)

print("All pages of every PDF in the folder have been successfully saved.")

This script automatically processes every PDF file in the folder pdf_folder and saves each page as a separate PDF named <filename>_page_<number>.pdf.

5. When to Split PDF Files?

As one great developer (who, like us, often spends hours coding) said, "Why?". Splitting PDF documents can be useful in various scenarios:

  • Highlight key pages: Share only the needed pages without sending the entire "War and Peace" to your colleagues.
  • Create collections: Save important chapters from different documents into one file for study or work purposes.
  • Archiving: Store an archive of important data page by page, so you don't lose valuable information among unnecessary pages.

That's where our hero PyPDF2 comes in to make your life easier!

6. Random Challenges and How to Avoid Them

While working with PDF documents, you might encounter some challenges. For instance, page numbering in PyPDF2 starts from zero, so don't forget to account for this when specifying the pages you need. There's nothing worse than spending time extracting the wrong pages!

Also, when working with open and closed files, make sure all close() calls are handled correctly. An unclosed book can become a headache, especially if it's a textbook on Oracle.

Real-Life Applications

Using this method is not just useful for processing personal documents. For example, if you work at a law firm and need to quickly prepare specific sections of a large case for court, splitting the PDF into parts can save a lot of time and effort.

Such techniques can also be used to prepare educational materials, highlight important sections of technical reports, and much more. The real magic lies in how easily you can automate these routine tasks!

1
Task
Python SELF EN, level 43, lesson 4
Locked
Splitting One Page from a PDF Document
Splitting One Page from a PDF Document
2
Task
Python SELF EN, level 43, lesson 4
Locked
Extracting a Range of Pages
Extracting a Range of Pages
3
Task
Python SELF EN, level 43, lesson 4
Locked
Saving Multiple Page Ranges
Saving Multiple Page Ranges
4
Task
Python SELF EN, level 43, lesson 4
Locked
Automatic processing of a folder with PDF documents
Automatic processing of a folder with PDF documents
1
Опрос
Reading PDF files with PyPDF2,  43 уровень,  4 лекция
недоступен
Reading PDF files with PyPDF2
Reading PDF files with PyPDF2
Comments
TO VIEW ALL COMMENTS OR TO MAKE A COMMENT,
GO TO FULL VERSION