CodeGym /Courses /Python SELF EN /Extracting Text and Attributes Using Selenium

Extracting Text and Attributes Using Selenium

Python SELF EN
Level 36 , Lesson 2
Available

Hey there, future automation masters! Today, we’re diving into how to use Selenium to grab text and attributes from elements on a web page. This is a super handy skill to have in your toolkit because in the world of web scraping, the main game is getting your hands on that data. Ready to dig some useful info out of the internet? Let’s get started!

1. What’s this lecture about?

  • Data extraction basics: How to pull text from HTML elements—sounds like an adventure already, doesn’t it?
  • Getting attributes: Learn how to grab juicy details like links (href) and images (src), so you can then do something awesome with them.
  • Real-life examples: We’ll practice pulling data from tables and lists on a web page. After all, as the great programmer once said, “You don’t fully understand something until you’ve worked with the code yourself.”

2. Extracting text from elements

Alright, picture this: you’ve got a beautiful website with loads of useful info. You need to pull the text from elements like headings, paragraphs, and other HTML goodies. What do you do? That’s where Selenium swoops in to save the day.

Example

Python

from selenium import webdriver

# Setting up the driver for Chrome
driver = webdriver.Chrome()

# Open the website
driver.get("https://example.com")

# Find an element by class and extract its text
element = driver.find_element_by_class_name("example-class")
text = element.text
print("Extracted text:", text)

# Don't forget to close the browser
driver.quit()

Here we use the .text method to grab the text content of an element. Easier than memorizing all Python exceptions, right?

3. Extracting attributes from elements

Text is awesome, but what if you need something more specific, like a link’s URL or an image URL? Selenium’s got your back here too.

Example

Python

from selenium import webdriver

driver = webdriver.Chrome()

driver.get("https://example.com")

# Find an element by CSS selector and extract the 'href' attribute
link_element = driver.find_element_by_css_selector("a.link-class")
link_href = link_element.get_attribute("href")
print("Link URL:", link_href)

# Find an element by ID and extract the 'src' attribute
img_element = driver.find_element_by_id("logo")
img_src = img_element.get_attribute("src")
print("Image URL:", img_src)

driver.quit()

As you can see, the same process, but instead of .text, we use the .get_attribute("attribute_name") method. Simple and powerful.

4. Applying the methods in practice

Let’s move from theory to practice because, let’s face it, programmers aren’t fans of staying in the abstract for too long. Let’s look at an example where we extract data from a table on a web page.

Extracting data from tables

Let’s say you need to grab all rows from a table on a site and print them to the console. Here’s how you can do it:

Python

from selenium import webdriver

driver = webdriver.Chrome()

driver.get("https://example.com")

# Find the table by ID
table = driver.find_element_by_id("example-table")

# Find all rows in the table
rows = table.find_elements_by_tag_name("tr")

for row in rows:
    # Find all cells in the current row
    cells = row.find_elements_by_tag_name("td")
    for cell in cells:
        print(cell.text, end=' ')
    print()

driver.quit()

We first find the table, then loop through all its rows and cells, extracting and printing their text. It’s like untangling a complex web, but in the end, everything falls into place!

5. Common mistakes and how to avoid them

Before we dive into the coding adventure of extracting data, let’s talk about common mistakes you might run into.

When working with dynamic pages, timing can be your enemy. If you try to grab text or an attribute from an element that hasn’t loaded yet, you’ll hit a NoSuchElementException. It’s like trying to catch a surprise before it arrives. To dodge this, use explicit waits (WebDriverWait) instead of relying on luck.

Python

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()

driver.get("https://example.com")

try:
    element = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.CLASS_NAME, "lazy-class"))
    )
    print(element.text)
finally:
    driver.quit()

Here we use WebDriverWait and expected_conditions to wait for an element to load. It’s like waiting for a dish to be fully cooked so you can enjoy it properly.

1
Task
Python SELF EN, level 36, lesson 2
Locked
Extracting Simple Text
Extracting Simple Text
2
Task
Python SELF EN, level 36, lesson 2
Locked
Extracting Link Attributes
Extracting Link Attributes
3
Task
Python SELF EN, level 36, lesson 2
Locked
Extracting Texts and Attributes from a Table
Extracting Texts and Attributes from a Table
4
Task
Python SELF EN, level 36, lesson 2
Locked
Extracting Data from a Dynamic Page
Extracting Data from a Dynamic Page
Comments
TO VIEW ALL COMMENTS OR TO MAKE A COMMENT,
GO TO FULL VERSION