CodeGym /Java Course /Python SELF EN /Time-based Task Automation for Regular Data Collection

Time-based Task Automation for Regular Data Collection

Python SELF EN
Level 40 , Lesson 1
Available

1. Working with a Weather API

Now that we've got the basics down, let's check out a realistic scenario. Imagine we need to collect weather data every 30 minutes. To do this, we'll use a weather data API. Of course, for learning purposes, the use of a real API might be limited, so let's simulate what this would look like.

Getting Current Weather via OpenWeather API

This example shows how to use requests to get weather data for a specific city using the OpenWeather API.

Python

import schedule
import time
import requests

def fetch_weather(city):
    api_key = "YOUR_API_KEY"  # Replace with your OpenWeather API key
    url = f"http://api.openweathermap.org/data/2.5/weather?q={city}&appid={api_key}&units=metric"

    try:
        response = requests.get(url)
        response.raise_for_status()
        data = response.json()
        temperature = data["main"]["temp"]
        weather_description = data["weather"][0]["description"]
        print(f"Current temperature in {city}: {temperature}°C")
        print(f"Weather description: {weather_description}")
    except requests.exceptions.RequestException as e:
        print(f"Error fetching weather data: {e}")

def fetch_weather_of_london():
    fetch_weather("London")

# Schedule the task to run every 30 minutes
schedule.every(30).minutes.do(fetch_weather_of_london)

while True:
    schedule.run_pending()
    time.sleep(1)

Here we send a GET request to the OpenWeather API to get the current weather for a specified city. In the JSON response, we extract the temperature and weather description, then print them to the screen. Don’t forget to replace YOUR_API_KEY with your own API key.

Getting Current Exchange Rates via API

In this example, we'll use requests to fetch current exchange rates through an API.

Python

import schedule
import time
import requests

def fetch_exchange_rate():
    url = "https://api.exchangerate-api.com/v4/latest/USD"
    try:
        response = requests.get(url)
        response.raise_for_status()  # Check for successful request
        data = response.json()
        usd_to_eur = data["rates"]["EUR"]
        print(f"Current USD to EUR exchange rate: {usd_to_eur}")
    except requests.exceptions.RequestException as e:
        print(f"Error fetching the data: {e}")

# Schedule the task to run every 10 minutes
schedule.every(10).minutes.do(fetch_exchange_rate)

while True:
    schedule.run_pending()
    time.sleep(1)

Here we send a GET request to an exchange rates API and get back data in JSON format. The USD to EUR rate is extracted from the JSON response and printed to the screen. This script can be adapted to collect data for other currency pairs by changing the key in data["rates"].

With this script, we can constantly gather weather and currency data. Not bad for starters, right?

Real-World Scenarios

Automating data collection can be useful in various scenarios:

  • Server Monitoring: Automated server health checks can detect and alert on issues before they occur.
  • Social Media Data Collection: Continuous analysis of trends and brand mentions.
  • Tracking Currency Rates: Currency rate changes can be useful for business or personal needs.

2. Example of Automated Web Data Collection

What if we want to collect data from a web page? Say, regularly checking for news availability. For this, BeautifulSoup and requests come to the rescue.

Web Page Data Collection

Suppose we have a website from which we want to collect news headlines. Here's how we can do it:

Python

import requests
from bs4 import BeautifulSoup

def fetch_news():
    response = requests.get("http://example.com/news")
    soup = BeautifulSoup(response.content, 'html.parser')
    for headline in soup.find_all('h2', class_='news'):
        print(headline.text)

schedule.every().hour.do(fetch_news)

while True:
    schedule.run_pending()
    time.sleep(1)

In this example, every 60 minutes our script will check the web page and print the news headlines. This simplifies the process of getting up-to-date information.

Collecting News Headlines from a Website

In this example, we’ll use requests to fetch an HTML page and BeautifulSoup to parse news headlines.

Python

import requests
from bs4 import BeautifulSoup

def fetch_news_headlines():
    url = "https://www.bbc.com/news"
    try:
        response = requests.get(url)
        response.raise_for_status()
        soup = BeautifulSoup(response.text, 'html.parser')
        
        headlines = soup.find_all('h3')  # Find all <h3> tags (usually where headlines are)
        print("Latest news headlines on BBC:")
        for headline in headlines[:5]:  # Grab the first 5 headlines
            print("-", headline.get_text(strip=True))
    except requests.exceptions.RequestException as e:
        print(f"Error fetching the data: {e}")

fetch_news_headlines()

Here we load the BBC News page and use BeautifulSoup to find all <h3> tags where headlines are. We print the first 5 headlines, stripping extra spaces and symbols with strip=True.

Collecting Product Prices from Online Stores

This example shows how to extract product price data from an online store's website (like Amazon or another store). We use requests to fetch the page and BeautifulSoup to parse the prices.

Python

import requests
from bs4 import BeautifulSoup

def fetch_product_price(url):
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36"
    }
    try:
        response = requests.get(url, headers=headers)
        response.raise_for_status()
        
        soup = BeautifulSoup(response.text, 'html.parser')
        product_name = soup.find('span', {'id': 'productTitle'}).get_text(strip=True)
        price = soup.find('span', {'class': 'a-price-whole'}).get_text(strip=True)
        
        print(f"Product: {product_name}")
        print(f"Price: {price} USD")
    except requests.exceptions.RequestException as e:
        print(f"Error fetching the data: {e}")
    except AttributeError:
        print("Couldn't find product or price information")

# Example product link
fetch_product_price("https://www.amazon.com/dp/B08N5WRWNW")

In this example, we send a GET request with a User-Agent header to avoid blocks. Then, using BeautifulSoup, we search for the product name via its id="productTitle" and the product price via the class a-price-whole. We use strip=True to remove extra spaces.

1
Task
Python SELF EN, level 40, lesson 1
Locked
Automated Weather Data Collection
Automated Weather Data Collection
2
Task
Python SELF EN, level 40, lesson 1
Locked
Automating Currency Exchange Rate Collection
Automating Currency Exchange Rate Collection
3
Task
Python SELF EN, level 40, lesson 1
Locked
Collecting news headlines using BeautifulSoup
Collecting news headlines using BeautifulSoup
4
Task
Python SELF EN, level 40, lesson 1
Locked
Automated Product Price Check
Automated Product Price Check
Comments
TO VIEW ALL COMMENTS OR TO MAKE A COMMENT,
GO TO FULL VERSION