CodeGym /Java Course /Python SELF EN /Data Storage and Caching

Data Storage and Caching

Python SELF EN
Level 34 , Lesson 4
Available

1. Why do we need caching?

Alright, folks, we’ve arrived at one of the coolest parts of web scraping — data caching. Why caching? Because it's like setting your own "clean five-minute record" when working with scripts! Let’s figure out why it’s necessary and how it works, keeping it simple, so your head doesn’t spin.

Imagine this: you’ve done web scraping on a site, grabbed all the needed data, and tomorrow you want to update it. Do you really have to dive into an endless loop of requests again? Nope, you can avoid redundant work and save your data using caching.

Advantages of caching:

  • Speed: Cached data is accessed faster than downloading it from the server again. It’s like having quick access to your favorite pastry: no need to head back to the bakery, it’s already in your fridge!
  • Efficiency: You don’t overload servers with extra requests, and you save your internet traffic. Sweet bonus!
  • Reliability: Caching helps handle temporary connection issues. If the site suddenly becomes unavailable, you still have your data. Almost like having a backup parachute.

2. Basics of Data Caching

What is a cache?

A cache is a temporary storage that allows reusing previously retrieved data. In programming, caching helps avoid re-fetching the same data repeatedly. Think of a cache as your personal library of frequently used information.

Types of caches:

  • In-memory cache: Fast but resets when power is off. Works like RAM.
  • File cache: Data is saved to disk, making it more durable and long-lasting.

3. Practical Caching with Python

To cache data in Python, we can use the requests library. But requests doesn’t support caching out of the box. That’s where requests-cache comes to the rescue, providing an easy way to add caching to your requests.

Installing the library

Bash

pip install requests-cache

Setting up caching

Let’s set up caching in our script:

Python

import requests_cache

# Setting up SQLite cache
requests_cache.install_cache('demo_cache', expire_after=180)

import requests

# Sending request
response = requests.get('https://jsonplaceholder.typicode.com/todos/1')

# Checking where the response came from
print(f'From cache: {response.from_cache}')

# Displaying data
print(response.json())

First, we set up the cache using requests_cache.install_cache. This creates an SQLite database for storing cached data. The expire_after parameter specifies the time (in seconds) after which cached data will be deleted. Here, we’ve set caching for three minutes.

Features of caching

When you run this code again, pay attention to response.from_cache. This variable will be True for subsequent calls within the first three minutes.

Clearing the cache

Clearing the cache is easy: delete the database file or use the requests_cache.clear() method to remove all entries from your cache.

4. Advanced Caching Features

Conditional Caching

Sometimes, you might need more controlled caching. For example, you might not want to cache data if it’s outdated or when request parameters change.

In such cases, you can use requests-cache with additional parameters:

Python

requests_cache.install_cache('custom_cache',
             allowable_methods=['GET', 'POST'],
             allowable_codes=[200, 404],
             ignored_parameters=['timestamp'])

Here, we enable caching for GET and POST methods and only for responses with codes 200 and 404. We also ignore the timestamp parameter so that requests with different timestamps are not treated as different.

Working with Redis

If you need a more powerful solution, like supporting distributed caching, you can use redis. It’s an in-memory data caching system popular in the big data world.

Steps to work with Redis:

  1. Install Redis and the Python library:
    Bash
    
    brew install redis  # for macOS users
    pip install redis
    
  2. Set up Redis in your project:
    Python
    
    import redis
    import requests
    
    r = redis.Redis(host='localhost', port=6379, db=0)
    
    def get_cached_response(url):
        if r.get(url):
            return r.get(url).decode('utf-8')
        else:
            response = requests.get(url)
            r.setex(url, 3600, response.text)  # caching for 1 hour
            return response.text
    
    print(get_cached_response('https://jsonplaceholder.typicode.com/todos/1'))
    

This example uses Redis to store responses for one hour. We check if the data is in the cache, and only if it’s absent do we make an HTTP request.

5. Error Handling

When working with caching, sometimes the database might get corrupted, or the cache doesn’t update. In such cases, it’s good practice to log issues and regularly check the data.

Example code for logging:

Python

import logging

logging.basicConfig(level=logging.INFO)

try:
    response = get_cached_response('https://jsonplaceholder.typicode.com/todos/1')
    logging.info("Data successfully retrieved from cache")
except Exception as e:
    logging.error("Error retrieving data: %s", str(e))

Final Thoughts

Caching isn’t just a tool for speeding up tasks. It’s a way to make your apps more reliable and resilient to temporary network hiccups or server overloads. Using tools like requests-cache or redis allows you to efficiently manage requests and save data for future use. Become a caching guru and don’t overload your scripts with unnecessary requests! And as the old programmer saying goes: "Better to cache once than ask a hundred times."

1
Task
Python SELF EN, level 34, lesson 4
Locked
Basics of Data Caching
Basics of Data Caching
2
Task
Python SELF EN, level 34, lesson 4
Locked
Caching with Parameters
Caching with Parameters
3
Task
Python SELF EN, level 34, lesson 4
Locked
Using Redis for Caching
Using Redis for Caching
4
Task
Python SELF EN, level 34, lesson 4
Locked
Error handling in a cached request
Error handling in a cached request
1
Опрос
Bypassing restrictions for parsing,  34 уровень,  4 лекция
недоступен
Bypassing restrictions for parsing
Bypassing restrictions for parsing
Comments
TO VIEW ALL COMMENTS OR TO MAKE A COMMENT,
GO TO FULL VERSION