1. Working with a Weather API
Now that we've got the basics down, let's check out a realistic scenario. Imagine we need to collect weather data every 30 minutes. To do this, we'll use a weather data API. Of course, for learning purposes, the use of a real API might be limited, so let's simulate what this would look like.
Getting Current Weather via OpenWeather API
This example shows how to use requests
to get weather data for a specific city using the OpenWeather API.
import schedule
import time
import requests
def fetch_weather(city):
api_key = "YOUR_API_KEY" # Replace with your OpenWeather API key
url = f"http://api.openweathermap.org/data/2.5/weather?q={city}&appid={api_key}&units=metric"
try:
response = requests.get(url)
response.raise_for_status()
data = response.json()
temperature = data["main"]["temp"]
weather_description = data["weather"][0]["description"]
print(f"Current temperature in {city}: {temperature}°C")
print(f"Weather description: {weather_description}")
except requests.exceptions.RequestException as e:
print(f"Error fetching weather data: {e}")
def fetch_weather_of_london():
fetch_weather("London")
# Schedule the task to run every 30 minutes
schedule.every(30).minutes.do(fetch_weather_of_london)
while True:
schedule.run_pending()
time.sleep(1)
Here we send a GET request to the OpenWeather API to get the current weather for a specified city. In the JSON response, we extract the temperature and weather description, then print them to the screen. Don’t forget to replace YOUR_API_KEY
with your own API key.
Getting Current Exchange Rates via API
In this example, we'll use requests
to fetch current exchange rates through an API.
import schedule
import time
import requests
def fetch_exchange_rate():
url = "https://api.exchangerate-api.com/v4/latest/USD"
try:
response = requests.get(url)
response.raise_for_status() # Check for successful request
data = response.json()
usd_to_eur = data["rates"]["EUR"]
print(f"Current USD to EUR exchange rate: {usd_to_eur}")
except requests.exceptions.RequestException as e:
print(f"Error fetching the data: {e}")
# Schedule the task to run every 10 minutes
schedule.every(10).minutes.do(fetch_exchange_rate)
while True:
schedule.run_pending()
time.sleep(1)
Here we send a GET request to an exchange rates API and get back data in JSON format. The USD to EUR rate is extracted from the JSON response and printed to the screen. This script can be adapted to collect data for other currency pairs by changing the key in data["rates"]
.
With this script, we can constantly gather weather and currency data. Not bad for starters, right?
Real-World Scenarios
Automating data collection can be useful in various scenarios:
- Server Monitoring: Automated server health checks can detect and alert on issues before they occur.
- Social Media Data Collection: Continuous analysis of trends and brand mentions.
- Tracking Currency Rates: Currency rate changes can be useful for business or personal needs.
2. Example of Automated Web Data Collection
What if we want to collect data from a web page? Say, regularly checking for news availability. For this, BeautifulSoup
and requests
come to the rescue.
Web Page Data Collection
Suppose we have a website from which we want to collect news headlines. Here's how we can do it:
import requests
from bs4 import BeautifulSoup
def fetch_news():
response = requests.get("http://example.com/news")
soup = BeautifulSoup(response.content, 'html.parser')
for headline in soup.find_all('h2', class_='news'):
print(headline.text)
schedule.every().hour.do(fetch_news)
while True:
schedule.run_pending()
time.sleep(1)
In this example, every 60 minutes our script will check the web page and print the news headlines. This simplifies the process of getting up-to-date information.
Collecting News Headlines from a Website
In this example, we’ll use requests
to fetch an HTML page and BeautifulSoup
to parse news headlines.
import requests
from bs4 import BeautifulSoup
def fetch_news_headlines():
url = "https://www.bbc.com/news"
try:
response = requests.get(url)
response.raise_for_status()
soup = BeautifulSoup(response.text, 'html.parser')
headlines = soup.find_all('h3') # Find all <h3> tags (usually where headlines are)
print("Latest news headlines on BBC:")
for headline in headlines[:5]: # Grab the first 5 headlines
print("-", headline.get_text(strip=True))
except requests.exceptions.RequestException as e:
print(f"Error fetching the data: {e}")
fetch_news_headlines()
Here we load the BBC News page and use BeautifulSoup
to find all <h3>
tags where headlines are. We print the first 5 headlines, stripping extra spaces and symbols with strip=True
.
Collecting Product Prices from Online Stores
This example shows how to extract product price data from an online store's website (like Amazon or another store). We use requests
to fetch the page and BeautifulSoup
to parse the prices.
import requests
from bs4 import BeautifulSoup
def fetch_product_price(url):
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36"
}
try:
response = requests.get(url, headers=headers)
response.raise_for_status()
soup = BeautifulSoup(response.text, 'html.parser')
product_name = soup.find('span', {'id': 'productTitle'}).get_text(strip=True)
price = soup.find('span', {'class': 'a-price-whole'}).get_text(strip=True)
print(f"Product: {product_name}")
print(f"Price: {price} USD")
except requests.exceptions.RequestException as e:
print(f"Error fetching the data: {e}")
except AttributeError:
print("Couldn't find product or price information")
# Example product link
fetch_product_price("https://www.amazon.com/dp/B08N5WRWNW")
In this example, we send a GET request with a User-Agent
header to avoid blocks. Then, using BeautifulSoup
, we search for the product name via its id="productTitle"
and the product price via the class a-price-whole
. We use strip=True
to remove extra spaces.
GO TO FULL VERSION