1. Preparing for navigation
Before we set out on our great journey, we need to gather the right tools: Selenium and a pinch of cleverness. We assume you already know how to set up Selenium and a browser driver, so let’s go!
Setting up the driver
from selenium import webdriver
# Launch the browser driver, like Chrome
driver = webdriver.Chrome()
# Open the first page of our adventure
driver.get("http://example.com/start-page")
So far, it’s all pretty standard — we opened a browser and loaded up the start page. But here’s where the fun begins: we don’t want to just hang out on one page. We want to visit all its neighbors!
2. Pagination: strolling through pages
The easiest and friendliest way to navigate between pages is pagination. You’ve all seen those cute little numbers at the bottom of a page, right? They’re like street signs: “Your next stop — page 2.”
Extracting data from pages
Before we start our journey, we want to collect information from the current page. Let's say it’s product lists or article headlines.
def extract_data():
# Find all the elements we’re interested in on the page, like titles
titles = driver.find_elements_by_class_name("item-title")
for title in titles:
print(title.text) # Sure, we’re just printing text here, but you can save it wherever you want
extract_data()
If you skipped the previous lectures, this bit of code finds all titles with the class item-title and prints them.
Navigating to the next page
Now that we’re armed with data, it’s time to move on. Pagination is often represented as buttons with links to the next or previous pages. We need to find those buttons and click on them.
def go_to_next_page():
try:
# Find the next page button and click on it
next_button = driver.find_element_by_link_text("Next")
next_button.click()
except NoSuchElementException:
# If the button isn’t there, we’ve reached the end
print("End of list.")
This function looks for a link with the text "Next". If it finds it, it clicks to go to the next page. If not, our bot understands it’s reached the end of the internet… well, at least this sequence of pages.
3. Looping through pages
What are we missing for total happiness? Right, a loop! Let’s combine all this into one handy loop so our bot can visit all the available pages like a real pro.
while True:
extract_data() # Collect data from the current page
go_to_next_page() # Move to the next one
time.sleep(2) # Take a little break so we don’t spook the server
And there you go, now our bot bravely visits all the pages where the "Next" button can be clicked. This code runs until the pages run out. Remember, a short pause between requests is your friendship with the server. Nobody loves spammers, especially site admins.
4. Dynamic interaction
Friends, life isn’t as simple as these pagination examples. Sometimes a page behaves like an elusive ninja, loading data dynamically as you scroll. No worries, we’ll handle that too.
Explicit waits
Waits let your code chill for a bit until the needed element becomes available. This is especially useful when content doesn’t load instantly.
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
def wait_for_element(locator):
try:
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CLASS_NAME, locator))
)
return element
except TimeoutException:
print("Element not found.")
With this function, your bot will be in harmony with dynamically loading content, waiting for the elements to become available.
Scrolling the page
For dynamic content found on magical pages, like infinite scrolling, we might need to scroll down to load more elements.
def scroll_down():
# Use JavaScript to scroll down
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(2)
The JavaScript script helps perform a smooth scroll down, giving the page a chance to load more content. You can use this trick in a loop to handle infinite scrolling.
5. Features and hacks
Going beyond the examples — this is a book of adventures, full of surprises. Imagine your bot, like you, needs to be ready for different scenarios.
If there’s no "Next" button, but there’s pagination with page numbers, use a dynamic approach by inserting the number into the URL directly. And if your site suddenly decides to become a ninja and hide some pages, tweak your scripts so they can adapt, staying ready for surprises.
GO TO FULL VERSION