CodeGym /ํ–‰๋™ /Python SELF KO /์Šคํฌ๋ฆฝํŠธ ์•ˆ์ •์„ฑ ๋ฐ ์˜ค๋ฅ˜ ์ตœ์†Œํ™”๋ฅผ ์œ„ํ•œ ์ตœ์ ํ™”

์Šคํฌ๋ฆฝํŠธ ์•ˆ์ •์„ฑ ๋ฐ ์˜ค๋ฅ˜ ์ตœ์†Œํ™”๋ฅผ ์œ„ํ•œ ์ตœ์ ํ™”

Python SELF KO
๋ ˆ๋ฒจ 38 , ๋ ˆ์Šจ 3
์‚ฌ์šฉ ๊ฐ€๋Šฅ

1. ์„ฑ๋Šฅ ๋ถ„์„

์™œ ์ตœ์ ํ™”๊ฐ€ ํ•„์š”ํ• ๊นŒ?

๊ฐ•๋ ฅํ•œ ์ž๋™์ฐจ๊ฐ€ ์žˆ๋‹ค๊ณ  ์ƒ์ƒํ•ด๋ด. 3์ดˆ ๋งŒ์— ์‹œ์† 100km๊นŒ์ง€ ๊ฐ€์†ํ•  ์ˆ˜ ์žˆ์ง€๋งŒ, ์—ฐ๋ฃŒ๋ฅผ ๊ณ ๋ž˜๊ฐ€ ํ”Œ๋ž‘ํฌํ†ค์„ ๋จน๋Š” ๊ฒƒ์ฒ˜๋Ÿผ ์†Œ๋ชจํ•œ๋‹ค๋ฉด? ๋‹น์‹ ์˜ ์Šคํฌ๋ฆฝํŠธ๊ฐ€ ๋งค์šฐ ๋น ๋ฅด๋”๋ผ๋„, ์ž์›๊ณผ ์‹คํ–‰ ์‹œ๊ฐ„ ๋ฉด์—์„œ ๋„ˆ๋ฌด "ํƒ์š•์ "์ผ ์ˆ˜ ์žˆ์–ด. ๋” ๋‚˜์•„๊ฐ€, ์ž์› "๋ˆ„์ˆ˜"๋Š” ์Šคํฌ๋ฆฝํŠธ๋ฅผ ๋ถˆ์•ˆ์ •ํ•˜๊ฒŒ ๋งŒ๋“ค์–ด ์˜ค๋ฅ˜๋ฅผ ๋ฐœ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ์–ด. ์ตœ์ ํ™”๋Š” ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ๋ฐฉ์ง€ํ•˜๋Š” ๋ฐ ๋„์›€์„ ์ค˜.

๋จผ์ €, ์™ธ๊ณผ ์˜์‚ฌ๊ฐ€ ๋งํ•˜๋“ฏ์ด "์ ˆ๊ฐœ"๋ฅผ ํ•ด๋ณด์ž. ์Šคํฌ๋ฆฝํŠธ์˜ ์„ฑ๋Šฅ์„ ๋ถ„์„ํ•ด์„œ ์–ด๋””์—์„œ "๊ณ ํ†ต๋ฐ›๊ณ " ์žˆ๋Š”์ง€ ํŒŒ์•…ํ•ด ๋ณด์ž.

์Šคํฌ๋ฆฝํŠธ ์†๋„์™€ ์•ˆ์ •์„ฑ์„ ํ…Œ์ŠคํŠธํ•˜๋Š” ๋ฐฉ๋ฒ•

๋ถ„์„์˜ ๊ฐ„๋‹จํ•œ ๋ฐฉ๋ฒ• ์ค‘ ํ•˜๋‚˜๋Š” time ๊ฐ™์€ Python ๊ธฐ๋ณธ ๋ชจ๋“ˆ์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฑฐ์•ผ. ์Šคํฌ๋ฆฝํŠธ์— ๋ช‡ ์ค„ ์ถ”๊ฐ€ํ•ด์„œ ์–ด๋–ค ์ž‘์—…์ด ๊ฐ€์žฅ ์˜ค๋ž˜ ๊ฑธ๋ฆฌ๋Š”์ง€ ํŒŒ์•…ํ•ด ๋ณด์ž.

Python

import time

start_time = time.time()
# ์—ฌ๊ธฐ์—์„œ Selenium์„ ์‚ฌ์šฉํ•˜์—ฌ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•˜๋Š” ์ฝ”๋“œ๋ฅผ ์ž‘์„ฑํ•˜์„ธ์š”
end_time = time.time()

print(f"์ˆ˜ํ–‰ ์‹œ๊ฐ„: {end_time - start_time} ์ดˆ")
  

์ด ๊ฐ„๋‹จํ•œ ์ฝ”๋“œ ์กฐ๊ฐ์€ ์ฝ”๋“œ ์ผ๋ถ€๋ฅผ ์‹คํ–‰ํ•˜๋Š” ๋ฐ ์–ผ๋งˆ๋‚˜ ์‹œ๊ฐ„์ด ๊ฑธ๋ฆฌ๋Š”์ง€ ์•Œ๋ ค์ค„ ๊ฑฐ์•ผ. ์ด๋ ‡๊ฒŒ "ํƒ€์ด๋จธ"๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด "์ข์€ ๋ถ€๋ถ„"์„ ์ฐพ์•„๋‚ผ ์ˆ˜ ์žˆ์–ด.

์ทจ์•ฝํ•œ ๋ถ€๋ถ„์„ ํ™•์ธํ•˜๊ณ  ์ตœ์ ํ™”ํ•˜๊ธฐ

์‹œ๊ฐ„์„ ๋งŽ์ด ์žก์•„๋จน๋Š” ์ฝ”๋“œ ๋ถ€๋ถ„์„ ์ฐพ์•˜์œผ๋ฉด, ์กฐ์น˜๋ฅผ ์ทจํ•ด ๋ณด์ž. ์˜ˆ๋ฅผ ๋“ค์–ด, ๋ถˆํ•„์š”ํ•˜๊ฒŒ ๋™์  ์š”์†Œ์— ๋„ˆ๋ฌด ์ž์ฃผ ์ ‘๊ทผํ•˜๊ฑฐ๋‚˜ ์ฝ”๋“œ๊ฐ€ "์ŠคํŒŒ๊ฒŒํ‹ฐ"์ฒ˜๋Ÿผ ๋˜์–ด ์žˆ์„ ์ˆ˜ ์žˆ์–ด. ์ฒซ ๋ฒˆ์งธ ๋‹จ๊ณ„๋Š” ํ™•์ธ์ด๊ณ , ๋‘ ๋ฒˆ์งธ ๋‹จ๊ณ„๋Š” ํ–‰๋™์ด์•ผ.

์š”์ฒญ ์ˆ˜ ์ค„์ด๊ธฐ: ํŽ˜์ด์ง€ ๊ฐ„ ์ด๋™์ด๋‚˜ DOM ๊ฐฑ์‹ ์„ ๋„ˆ๋ฌด ์ž์ฃผ ํ•˜๋Š”์ง€ ํ™•์ธํ•ด ๋ด. ์˜ˆ๋ฅผ ๋“ค์–ด, WebDriverWait ๋ฉ”์„œ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ์›ํ•˜๋Š” ์š”์†Œ๊ฐ€ ์™„์ „ํžˆ ๋กœ๋“œ๋œ ํ›„์—๋งŒ ์Šคํฌ๋ฆฝํŠธ๋ฅผ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ์–ด.

Python

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

element = WebDriverWait(driver, 10).until(
    EC.presence_of_element_located((By.ID, 'myDynamicElement'))
)
  

๋ฐ์ดํ„ฐ ์บ์‹ฑ: ๋™์ผํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ์—ฌ๋Ÿฌ ๋ฒˆ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ ์บ์‹ฑ์„ ๊ณ ๋ คํ•ด ๋ด. ๋ฐ์ดํ„ฐ๋ฅผ ๋ณ€์ˆ˜๋‚˜ ์บ์‹œ์— ์ €์žฅํ•ด์„œ ์ž์›์„ ๋งŽ์ด ์‚ฌ์šฉํ•˜๋Š” ์ž‘์—…์„ ์ตœ์†Œํ™”ํ•  ์ˆ˜ ์žˆ์–ด.

2. ์Šคํฌ๋ฆฝํŠธ ๊ตฌ์กฐ ๊ฐœ์„ 

์ฝ”๋“œ๊ฐ€ ์ง€ํ•˜์ฒ  ๋…ธ์„ ๋„์ฒ˜๋Ÿผ ์ฝํžŒ๋‹ค๋ฉด, ์ด์ œ ๊ฐœ์„ ํ•  ๋•Œ๊ฐ€ ๋œ ๊ฑฐ์•ผ. ์ตœ์ ํ™”๋œ ์ฝ”๋“œ ๊ตฌ์กฐ๋Š” ๊ฐ€๋…์„ฑ๊ณผ ์˜ค๋ฅ˜ ๋‚ด์„ฑ์„ ๋†’์ด๋Š” ํ•ต์‹ฌ์ด์•ผ.

๋ฐ์ดํ„ฐ ํŒŒ์ดํ”„๋ผ์ธ ๋ฐ ์ตœ์  ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์†”๋ฃจ์…˜ ์‚ฌ์šฉ

์ฝ”๋“œ๋ฅผ ๊ฐ ํ•จ์ˆ˜๋‚˜ ๋ชจ๋“ˆ์ด ๊ฐ์ž์˜ ๋…ผ๋ฆฌ์  ์ž‘์—…์„ ๋‹ด๋‹นํ•˜๋Š” ํŒŒ์ดํ”„๋ผ์ธ ํ˜•ํƒœ๋กœ ๊ตฌ์กฐํ™”ํ•˜๋Š” ๊ฑธ ๊ณ ๋ฏผํ•ด ๋ด. ์ฝ”๋“œ๋ฅผ ๋…ผ๋ฆฌ์ ์ธ ๋ธ”๋ก์œผ๋กœ ๋‚˜๋ˆ„๋ฉด ๊ฐ€๋…์„ฑ์ด ์ข‹์•„์ง€๊ณ  ๋””๋ฒ„๊น…๋„ ์‰ฌ์›Œ์ ธ.

Python

def load_page(url):
    driver.get(url)

def extract_data():
    # ๋ฐ์ดํ„ฐ ์ถ”์ถœ ์ฝ”๋“œ
    pass

def save_data():
    # ๋ฐ์ดํ„ฐ ์ €์žฅ ์ฝ”๋“œ
    pass

load_page("http://example.com")
extract_data()
save_data()
  

์ฝ”๋“œ ๊ฐ€๋…์„ฑ๊ณผ ํ…Œ์ŠคํŠธ ์šฉ์ด์„ฑ ๊ฐœ์„ 

"ํ•˜๋‚˜์˜ ํ•จ์ˆ˜๋Š” ํ•˜๋‚˜์˜ ์ž‘์—…์„ ํ•œ๋‹ค"๋Š” ์›์น™์„ ๋”ฐ๋ฅด์ž. ์ด๋ ‡๊ฒŒ ํ•˜๋ฉด ํ…Œ์ŠคํŠธ์™€ ๋ฆฌํŒฉํ† ๋ง์ด ์‰ฌ์›Œ์ ธ. "๋งค์ง ๋„˜๋ฒ„"๋‚˜ ๋ฌธ์ž์—ด ๋Œ€์‹  ๋ช…๋ช…๋œ ์ƒ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•ด ๋” ๋ช…ํ™•ํ•˜๊ฒŒ ํ•˜์ž.

Python

MAX_RETRIES = 5

def fetch_data_with_retry():
    for attempt in range(MAX_RETRIES):
        try:
            # ๋ฐ์ดํ„ฐ ์š”์ฒญ ์‹œ๋„
            pass
        except Exception as e:
            print(f"์‹œ๋„ {attempt+1} ์‹คํŒจ: {e}")
  

3. ๊ฐœ์„ ํ•  ์ˆ˜ ์žˆ๋Š” ์ฝ”๋“œ๋ผ๋ฉด ๋ฐ˜๋“œ์‹œ ๊ฐœ์„ ํ•˜์ž

์•”์‹œ์  ๋Œ€๊ธฐ ๋Œ€์‹  ๋ช…์‹œ์  ๋Œ€๊ธฐ ์‚ฌ์šฉ

๋ช…์‹œ์  ๋Œ€๊ธฐ๋Š” Selenium์ด ์›ํ•˜๋Š” ์š”์†Œ๊ฐ€ ๋‚˜ํƒ€๋‚  ๋•Œ๋งŒ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•˜๋„๋ก ์ •ํ™•ํ•œ ์ œ์–ด๋ฅผ ํ—ˆ์šฉํ•ด ์ค˜. ์•”์‹œ์  ๋Œ€๊ธฐ (implicitly_wait) ๋Œ€์‹ , ํŠน์ • ์š”์†Œ๋ฅผ ์กฐ๊ฑด์— ๋”ฐ๋ผ ๊ธฐ๋‹ค๋ฆด ์ˆ˜ ์žˆ๋Š” WebDriverWait๋ฅผ ์‚ฌ์šฉํ•ด ๋ด.

๋ช…์‹œ์  ๋Œ€๊ธฐ ์‚ฌ์šฉ ์˜ˆ์ œ

Python

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

element = WebDriverWait(driver, 10).until(
    EC.visibility_of_element_located((By.ID, "target_element"))
)
  

ํŽ˜์ด์ง€ ์ค€๋น„ ์ƒํƒœ ํ™•์ธ

ํŽ˜์ด์ง€๊ฐ€ ๋กœ๋“œ๋œ ํ›„์—๋„ ๋ชจ๋“  ์š”์†Œ๊ฐ€ ์ฆ‰์‹œ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๊ฒƒ์€ ์•„๋‹ˆ์•ผ, ํŠนํžˆ AJAX๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ. document.readyState๋ฅผ ์‚ฌ์šฉํ•ด ๋ฌธ์„œ๊ฐ€ ์™„์ „ํžˆ ๋กœ๋“œ๋˜์—ˆ๋Š”์ง€ ํ™•์ธํ•˜๊ณ  ์ž‘์—…์„ ์‹œ์ž‘ํ•˜์ž.

ํŽ˜์ด์ง€ ๋กœ๋“œ ์™„๋ฃŒ ํ™•์ธ ์˜ˆ์ œ

Python

def wait_for_page_load(driver):
    WebDriverWait(driver, 10).until(
        lambda d: d.execute_script("return document.readyState") == "complete"
    )
  

time.sleep ์‚ฌ์šฉ ์ตœ์†Œํ™”

time.sleep()๋Š” ์Šคํฌ๋ฆฝํŠธ๋ฅผ ๊ณ ์ •๋œ ์‹œ๊ฐ„ ๋™์•ˆ ๊ธฐ๋‹ค๋ฆฌ๊ฒŒ ํ•ด์„œ ์ž‘์—… ์†๋„๋ฅผ ๋Šฆ์ถฐ. ๋Œ€์‹ , WebDriverWait๋ฅผ ์‚ฌ์šฉํ•ด์„œ ํ•„์š”ํ•œ ์กฐ๊ฑด์ด ์ถฉ์กฑ๋  ๋•Œ๊นŒ์ง€ ๊ธฐ๋‹ค๋ฆฌ๋„๋ก ํ•˜์ž.

์ฝ”๋ฉ˜ํŠธ
TO VIEW ALL COMMENTS OR TO MAKE A COMMENT,
GO TO FULL VERSION