CodeGym /์ž๋ฐ” ์ฝ”์Šค /Python SELF KO /์—ฌ๋Ÿฌ ๊ฐœ์˜ PDF๋ฅผ ํ•˜๋‚˜์˜ ๋ฌธ์„œ๋กœ ํ•ฉ์น˜๊ธฐ

์—ฌ๋Ÿฌ ๊ฐœ์˜ PDF๋ฅผ ํ•˜๋‚˜์˜ ๋ฌธ์„œ๋กœ ํ•ฉ์น˜๊ธฐ

Python SELF KO
๋ ˆ๋ฒจ 43 , ๋ ˆ์Šจ 3
์‚ฌ์šฉ ๊ฐ€๋Šฅ

1. PyPDF2๋ฅผ ์‚ฌ์šฉํ•œ PDF ๋ณ‘ํ•ฉ

PDF ํŒŒ์ผ ๋ณ‘ํ•ฉ์ด ํ•„์š”ํ•œ ์ด์œ 

๋จผ์ €, ์™œ PDF ํŒŒ์ผ์„ ๋ณ‘ํ•ฉํ•ด์•ผ ํ•˜๋Š”์ง€์— ๋Œ€ํ•ด ์•Œ์•„๋ณด์ž. ํ”ํžˆ "ํ•˜๋‚˜์˜ PDF๊ฐ€ ์—ด ๊ฐœ์˜ PDF๋ณด๋‹ค ๋‚ซ๋‹ค!"๋ผ๊ณ  ๋งํ•˜์ž–์•„. ์ž‘์—… ํ™˜๊ฒฝ์—์„œ๋Š” ๋ณด๊ณ ์„œ, ์—ฐ๊ตฌ ๊ฒฐ๊ณผ, ๊ธฐ์ˆ  ๋ฌธ์„œ ๋˜๋Š” ๋ฐœํ‘œ ์ž๋ฃŒ ๋“ฑ์ด ๋ณ„๋„์˜ ํŒŒ์ผ ํ˜•ํƒœ๋กœ ์ œ๊ณต๋  ์ˆ˜ ์žˆ์–ด. ์ด๋Ÿฐ ํŒŒ์ผ๋“ค ์‚ฌ์ด๋ฅผ ๊ณ„์† ์˜ค๊ฐ€๋ฉฐ ์ž‘์—…ํ•˜๋ฉด ๋ถˆํŽธํ•˜๊ธฐ๋งŒ ํ•œ ๊ฒŒ ์•„๋‹ˆ๋ผ, ๋ญ”๊ฐ€ ๋†“์น  ์œ„ํ—˜๋„ ์ปค. ๋ชจ๋“  ํŒŒ์ผ์„ ํ•˜๋‚˜์˜ ๋ฌธ์„œ๋กœ ํ•ฉ์น˜๋ฉด ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•˜๊ณ  ๋ถ„์„ํ•˜๊ฑฐ๋‚˜ ๋ฐฐํฌํ•˜๋Š” ๋ฐ ํ›จ์”ฌ ์ฒด๊ณ„์ ์ธ ์ ‘๊ทผ์ด ๊ฐ€๋Šฅํ•ด.

๋˜ํ•œ, PDF ํŒŒ์ผ ๋ณ‘ํ•ฉ์€ ์ดํ›„ ์•„์นด์ด๋ธŒํ™”ํ•˜๊ฑฐ๋‚˜ ์ตœ์ข… ๋ณด๊ณ ์„œ๋ฅผ ์ž‘์„ฑํ•˜๊ฑฐ๋‚˜ ๋ฌธ์„œ์˜ ์—ฌ๋Ÿฌ ๋ฒ„์ „์„ ๊ฒฐํ•ฉํ•˜์—ฌ ๋ณ€๊ฒฝ ์‚ฌํ•ญ์„ ์ถ”์ ํ•  ๋•Œ๋„ ์œ ์šฉํ•ด. ๊ฐ€๋Šฅ์„ฑ์€ ๋ฌด๊ถ๋ฌด์ง„ํ•ด!

PyPDF2๋ฅผ ์‚ฌ์šฉํ•œ PDF ๋ณ‘ํ•ฉ ๊ธฐ๋ณธ

๋จผ์ € PyPDF2๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ธฐ๋ณธ ๋‹จ๊ณ„๋ถ€ํ„ฐ ์‹œ์ž‘ํ•˜์ž. ์—ฌ๊ธฐ์„œ๋Š” ์—ฌ๋Ÿฌ ๊ฐœ์˜ PDF ํŒŒ์ผ์„ ํ•˜๋‚˜๋กœ ํ•ฉ์น˜๋Š” ์Šคํฌ๋ฆฝํŠธ๋ฅผ ์ž‘์„ฑํ•ด๋ณผ ๊ฑฐ์•ผ. ๋ฌผ๋ก , ์ฝ”๋“œ๋Š” ๊ฐ ๋‹จ๊ณ„์—์„œ ๋ฌด์Šจ ์ผ์ด ์ผ์–ด๋‚˜๋Š”์ง€ ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋„๋ก ์ฃผ์„์„ ํฌํ•จํ•  ๊ฑฐ์•ผ.

Python

import PyPDF2

# PyPDF2 ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์—์„œ PdfMerger ๊ฐ์ฒด ์ƒ์„ฑ
pdf_merger = PyPDF2.PdfMerger()

# ๋ณ‘ํ•ฉํ•˜๊ณ ์ž ํ•˜๋Š” PDF ๋ฌธ์„œ ๋ชฉ๋ก
pdf_files = ['document1.pdf', 'document2.pdf', 'document3.pdf']

# PdfMerger ๊ฐ์ฒด์— ๊ฐ ํŒŒ์ผ ์ถ”๊ฐ€
for file in pdf_files:
    pdf_merger.append(file)

# ๋ณ‘ํ•ฉ ๊ฒฐ๊ณผ๋ฅผ ์ƒˆ๋กœ์šด PDF ํŒŒ์ผ๋กœ ์ €์žฅ
output_filename = 'merged_document.pdf'
with open(output_filename, 'wb') as output_file:
    pdf_merger.write(output_file)

# ์ž์› ํ•ด์ œ๋ฅผ ์œ„ํ•ด PdfMerger ๊ฐ์ฒด ๋‹ซ๊ธฐ
pdf_merger.close()

print(f"๋ณ‘ํ•ฉ๋œ PDF ์ƒ์„ฑ ์™„๋ฃŒ: {output_filename}")

๋ณ‘ํ•ฉ๋œ ๋ฌธ์„œ์˜ ์ˆœ์„œ์™€ ๊ตฌ์กฐ

์ด์ œ PDF ๋ฌธ์„œ๋ฅผ ๋ณ‘ํ•ฉํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ฐฐ์› ์œผ๋‹ˆ ํŽ˜์ด์ง€ ์ˆœ์„œ์— ๋Œ€ํ•ด ์ƒ๊ฐํ•ด๋ณด์ž. PyPDF2๋Š” .append() ๋ฉ”์„œ๋“œ์— ํŒŒ์ผ์„ ๋„˜๊ธด ์ˆœ์„œ๋Œ€๋กœ ํŽ˜์ด์ง€๋ฅผ ์ถ”๊ฐ€ํ•ด. ๋”ฐ๋ผ์„œ pdf_files ๋ฆฌ์ŠคํŠธ์˜ ์ˆœ์„œ๊ฐ€ ์ตœ์ข… ๋ฌธ์„œ์˜ ์ˆœ์„œ์— ์˜ํ–ฅ์„ ์ค€๋‹ค๋Š” ์ ์„ ๊ธฐ์–ตํ•˜์ž.

2. ๊ฐœ๋ณ„ ํŽ˜์ด์ง€ ๋ณ‘ํ•ฉ

๋ฌธ์„œ๋ฅผ ๋ณ‘ํ•ฉํ•˜๋Š” ๋Œ€์‹  ๊ฐ ๋ฌธ์„œ์˜ ์ผ๋ถ€ ํŽ˜์ด์ง€๋“ค๋กœ ์ตœ์ข… ๋ฌธ์„œ๋ฅผ ๊ตฌ์„ฑํ•˜๊ณ  ์‹ถ๋‹ค๋ฉด PdfMerger ๋Œ€์‹  PdfWriter ํด๋ž˜์Šค๋ฅผ ์‚ฌ์šฉํ•ด์•ผ ํ•ด. ์˜ˆ์ œ:

Python

import PyPDF2

# ๋ณ‘ํ•ฉํ•  PDF ํŒŒ์ผ ๋ชฉ๋ก
pdf_files = ["file1.pdf", "file2.pdf", "file3.pdf"]

# ๋ณ‘ํ•ฉ๋œ PDF ์ž‘์„ฑ์„ ์œ„ํ•œ PdfWriter ๊ฐ์ฒด ์ƒ์„ฑ
pdf_writer = PyPDF2.PdfWriter()

# ๊ฐ PDF ํŒŒ์ผ ์ˆœํšŒ
for pdf_file in pdf_files:
    with open(pdf_file, "rb") as file:
        pdf_reader = PyPDF2.PdfReader(file)
        # ๊ฐ ํŽ˜์ด์ง€๋ฅผ PdfWriter์— ์ถ”๊ฐ€
        for page_num in range(len(pdf_reader.pages)):
            # ์ถ”๊ฐ€ํ•˜์ง€ ์•Š์„ ํŽ˜์ด์ง€๋Š” ๊ฑด๋„ˆ๋›ฐ๊ธฐ ๊ฐ€๋Šฅ
            page = pdf_reader.pages[page_num]
            pdf_writer.add_page(page)

# ๋ณ‘ํ•ฉ๋œ PDF ์ €์žฅ
with open("merged_document.pdf", "wb") as output_file:
    pdf_writer.write(output_file)

์ด ์ฝ”๋“œ๊ฐ€ ์–ด๋–ป๊ฒŒ ๋™์ž‘ํ•˜๋Š”์ง€?

  1. ํŒŒ์ผ ๋ชฉ๋ก ์ƒ์„ฑ: pdf_files ๋ฆฌ์ŠคํŠธ์— ๋ณ‘ํ•ฉํ•ด์•ผ ํ•  PDF ๋ฌธ์„œ ๊ฒฝ๋กœ๋ฅผ ๋„ฃ์–ด์ค˜.
  2. PdfWriter ์ดˆ๊ธฐํ™”: pdf_writer๋Š” ์ƒˆ๋กœ์šด PDF ํŒŒ์ผ ์ƒ์„ฑ์„ ์œ„ํ•ด ์‚ฌ์šฉ๋ผ.
  3. ๊ฐ ํŒŒ์ผ ์ˆœํšŒ: ๊ฐ PDF ํŒŒ์ผ์„ ์ฝ๊ธฐ ๋ชจ๋“œ๋กœ ์—ด์–ด.
  4. ํŽ˜์ด์ง€ ์ถ”๊ฐ€: ํŒŒ์ผ์˜ ๋ชจ๋“  ํŽ˜์ด์ง€๋ฅผ add_page() ๋ฉ”์„œ๋“œ๋ฅผ ์‚ฌ์šฉํ•ด pdf_writer์— ์ถ”๊ฐ€ํ•ด.
  5. ๊ฒฐ๊ณผ ์ €์žฅ: ๋ชจ๋“  ํŽ˜์ด์ง€ ์ถ”๊ฐ€๊ฐ€ ๋๋‚˜๋ฉด ์ตœ์ข… PDF๋ฅผ merged_document.pdf์— ์ž‘์„ฑํ•ด.

3. ์ƒˆ๋กœ์šด ๋ฌธ์„œ ์Šคํƒ€์ผ๋ง

๋ถ๋งˆํฌ ๋ฐ ๋ชฉ์ฐจ ์ถ”๊ฐ€

๋ณ‘ํ•ฉ๋œ ๋ฌธ์„œ๊ฐ€ ๋„ˆ๋ฌด ์ปค์„œ ํƒ์ƒ‰ํ•˜๊ธฐ ํž˜๋“ค๋‹ค๋ฉด? ์ด๋Ÿด ๋•Œ ๋ถ๋งˆํฌ๊ฐ€ ๋น›์„ ๋ฐœํ•ด! PyPDF2๋Š” ๊ฐ„๋‹จํ•œ ๋ถ๋งˆํฌ๋ฅผ ์ถ”๊ฐ€ํ•  ์ˆ˜ ์žˆ๋Š” ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•ด. ๋ณ‘ํ•ฉํ•˜๋Š” ๊ฐ ๋ฌธ์„œ์— ๋ถ๋งˆํฌ๋ฅผ ์ถ”๊ฐ€ํ•ด๋ณด์ž.

Python

pdf_merger = PyPDF2.PdfMerger()

# ๋ถ๋งˆํฌ๋ฅผ ์œ„ํ•œ ํŽ˜์ด์ง€ ์˜คํ”„์…‹
page_offset = 0

for file in pdf_files:
    # ํ˜„์žฌ ๋ฌธ์„œ ์ฝ๊ธฐ
    pdf_reader = PyPDF2.PdfReader(file)

    # PdfMerger์— ๋ฌธ์„œ ์ถ”๊ฐ€
    pdf_merger.append(file)

    # ํŒŒ์ผ ์ด๋ฆ„์œผ๋กœ ๋ถ๋งˆํฌ ์ถ”๊ฐ€
    pdf_merger.add_bookmark(file, page_offset)

    # ํŽ˜์ด์ง€ ์˜คํ”„์…‹ ๊ฐฑ์‹ 
    page_offset += len(pdf_reader.pages)

with open(output_filename, 'wb') as output_file:
    pdf_merger.write(output_file)

pdf_merger.close()

์ด ํŠธ๋ฆญ์€ PDF ๋ฌธ์„œ๋ฅผ ์‰ฝ๊ฒŒ ํƒ์ƒ‰ํ•  ์ˆ˜ ์žˆ๋„๋ก ๋„์™€์ค˜.

๋ณ‘ํ•ฉ๋œ ํŒŒ์ผ์˜ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ์—…๋ฐ์ดํŠธ

๋ณ‘ํ•ฉ ํ›„ ๋ฌธ์„œ์˜ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ(์ž‘์„ฑ์ž, ์ œ๋ชฉ, ํ‚ค์›Œ๋“œ ๋“ฑ)๋ฅผ ์ถ”๊ฐ€ํ•˜๊ฑฐ๋‚˜ ์ˆ˜์ •ํ•  ์ˆ˜๋„ ์žˆ์–ด.

Python

import PyPDF2

pdf_files = ["file1.pdf", "file2.pdf"]
pdf_writer = PyPDF2.PdfWriter()

for pdf_file in pdf_files:
    with open(pdf_file, "rb") as file:
        pdf_reader = PyPDF2.PdfReader(file)
        for page_num in range(len(pdf_reader.pages)):
            page = pdf_reader.pages[page_num]
            pdf_writer.add_page(page)

# ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ์ถ”๊ฐ€
pdf_writer.add_metadata({
    "/Title": "๋ณ‘ํ•ฉ๋œ ๋ฌธ์„œ",
    "/Author": "์ด๋ฐ˜ ์ด๋ฐ”๋…ธํ”„",
    "/Subject": "ํŒ๋งค ๋ณด๊ณ ์„œ"
})

# ๋ณ‘ํ•ฉ๋œ ํŒŒ์ผ ์ €์žฅ
with open("merged_with_metadata.pdf", "wb") as output_file:
    pdf_writer.write(output_file)

์ด ์ฝ”๋“œ๋Š” ๋ฌธ์„œ๋ฅผ ์‹๋ณ„ํ•˜๊ณ  ๊ตฌ์กฐํ™”ํ•˜๋Š” ๋ฐ ์œ ์šฉํ•œ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ๋ฅผ ์ถ”๊ฐ€ํ•ด์ค˜.

์ฝ”๋ฉ˜ํŠธ
TO VIEW ALL COMMENTS OR TO MAKE A COMMENT,
GO TO FULL VERSION