CodeGym /Courses /Python SELF EN /Main Approaches for Navigating Multiple Pages

Main Approaches for Navigating Multiple Pages

Python SELF EN
Level 38 , Lesson 2
Available

1. Using the "Next" Button

If the site has a "Next" button or link to navigate to the next page, you can set up a loop to click on this button as long as it's available.

Code Example

Python

from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.by import By
import time

def initialize_driver():
    driver = webdriver.Chrome()
    driver.implicitly_wait(10)
    return driver

def open_page(driver, url):
    driver.get(url)

def collect_data(driver):
    # Example of collecting data from the current page
    items = driver.find_elements(By.CLASS_NAME, "item_class")
    for item in items:
        print(item.text)  # Here you can save or process the data
    
def click_next_button(driver):
    try:
        next_button = driver.find_element(By.LINK_TEXT, "Next")
        next_button.click()
        return True
    except NoSuchElementException:
        return False  # Button not found, meaning we're on the last page

def main():
    driver = initialize_driver()
    open_page(driver, "https://example.com/page1")

    try:
        while True:
            collect_data(driver)
            if not click_next_button(driver):
                break  # Exit the loop if the "Next" button is absent
            time.sleep(2)  # Delay for loading the next page
    finally:
        driver.quit()

main()

Code Explanation

initialize_driver() — initializes the driver.
open_page() — opens the first page to start working.
collect_data() — a function to collect data from the current page.
click_next_button() — a function that finds and clicks the "Next" button. If the button is missing, it returns False, which means page navigation has ended.
The loop in main() — the main loop for navigating pages. It stops when the "Next" button can no longer be found.

2. Pagination Using Page Numbers

Some sites have numbered page links (e.g., "1", "2", "3", and so on). In such cases, you can gather a list of links and navigate through them in sequence.

Code Example

Python

from selenium import webdriver
from selenium.webdriver.common.by import By
import time

def initialize_driver():
    driver = webdriver.Chrome()
    driver.implicitly_wait(10)
    return driver

def open_page(driver, url):
    driver.get(url)

def collect_data(driver):
    items = driver.find_elements(By.CLASS_NAME, "item_class")
    for item in items:
        print(item.text)

def go_to_page(driver, page_number):
    page_link = driver.find_element(By.LINK_TEXT, str(page_number))
    page_link.click()

def main():
    driver = initialize_driver()
    open_page(driver, "https://example.com/page1")

    try:
        total_pages = 5  # Specify the total number of pages if known
        for page in range(1, total_pages + 1):
            collect_data(driver)
            if page < total_pages:  # Don't navigate further after the last page
                go_to_page(driver, page + 1)
                time.sleep(2)  # Delay for loading the next page
    finally:
        driver.quit()

main()

Code Explanation

go_to_page() — a function that finds the link to the desired page by its number and navigates to it.
The loop in main() — uses the total_pages variable to determine the number of pages. The loop navigates to the next page until it reaches the last one.

3. Modifying the URL for Each Page

Some sites have a simple URL structure, where each page is identified by a number in the URL, like https://example.com/page/1, https://example.com/page/2, etc. In this case, you can just modify the URL to load the desired page, avoiding the need to search for elements.

Code Example

Python

from selenium import webdriver
import time

def initialize_driver():
    driver = webdriver.Chrome()
    driver.implicitly_wait(10)
    return driver

def open_page(driver, url):
    driver.get(url)

def collect_data(driver):
    items = driver.find_elements_by_class_name("item_class")
    for item in items:
        print(item.text)

def main():
    driver = initialize_driver()

    try:
        total_pages = 5  # Specify the total number of pages if known
        base_url = "https://example.com/page/"
        
        for page_number in range(1, total_pages + 1):
            url = f"{base_url}{page_number}"
            open_page(driver, url)
            collect_data(driver)
            time.sleep(2)  # Delay for loading the next page
    finally:
        driver.quit()

main()

Code Explanation

The base_url variable contains the base URL of the page. The loop appends the page number to it and sequentially opens each page.
The loop generates the URL for each page and collects data without clicking on elements. This minimizes the likelihood of errors.

4. Optimization Tips

  • Minimize waiting and clicks on dynamic elements: Using links and URLs is more robust than clicking on JavaScript-loaded buttons.
  • Use wait timers with minimal delay: When navigating to a new page, use a small delay like time.sleep(2) to ensure elements have time to load, but don't delay longer than needed.
  • Collect data after the full page has loaded: Ensure that the data on the page is fully loaded before starting its collection. Use implicitly_wait for reliable element detection.
  • Logging: Implement logging to record the current page, errors, and successful transitions. This will simplify the script's troubleshooting during its execution.
1
Task
Python SELF EN, level 38, lesson 2
Locked
Basics of Page Navigation using the "Next" button
Basics of Page Navigation using the "Next" button
2
Task
Python SELF EN, level 38, lesson 2
Locked
Pagination with Page Numbers
Pagination with Page Numbers
3
Task
Python SELF EN, level 38, lesson 2
Locked
Changing URL to navigate pages
Changing URL to navigate pages
4
Task
Python SELF EN, level 38, lesson 2
Locked
Data Collection with Logging
Data Collection with Logging
Comments
TO VIEW ALL COMMENTS OR TO MAKE A COMMENT,
GO TO FULL VERSION