6.1 Why do you need a proxy?
Nowadays, each country has its own internet. Entire websites, domains, applications, and even countries can be banned, not just individual users. That's uncool. But if you're a programmer, it's not a problem - the internet is full of proxies...
A proxy server (or just proxy) is an intermediary server that acts as a middleman between the client (like your computer) and the server you're trying to access. The proxy server accepts requests from the client, forwards them to the target server, receives the responses, and sends them back to the client.
Every major product has at least a few proxy servers that perform various useful functions. For example:
- Anonymization: A proxy server can hide the client's actual IP address, providing anonymous access to internet resources. An IP address is a unique identifier of a device in the network, and hiding it helps maintain user privacy.
- Caching: A proxy server can cache frequently requested resources, speeding up access and reducing network load. For instance, if many users request the same webpage, the proxy server can save a copy and deliver it directly without reaching out to the original server every time.
- Content Filtering: A proxy server can block access to certain websites or types of content, providing control and security.
- Bypassing Access Restrictions: A proxy server can help bypass regional access restrictions to content, allowing access to resources blocked in certain geographical areas.
- Logging and Monitoring: A proxy server can keep a log of all requests and responses, enabling tracking and analysis of network traffic.
How a proxy server works
- Client sends a request: A client device (like a computer or smartphone) sends a request to the proxy server.
- Proxy server processes the request: The proxy server receives the request, may modify it (like adding or removing headers), and forwards it to the target server.
- Target server responds: The target server processes the request and sends a response to the proxy server.
- Proxy server returns the response to the client: The proxy server receives the response from the target server, may cache it for future use, and forwards it back to the client.
Benefits of using a proxy server
- Enhancing Security: A proxy server can hide internal networks from the outside world, reducing the risk of attacks.
- Speeding up Access: Caching frequently requested resources reduces access time.
- Access Control: A proxy server can restrict access to certain sites or types of content, providing network usage control.
- Reducing Network Load: Through caching and traffic filtering, proxy servers can decrease the overall volume of data transmitted and network load.
Many server programs for security reasons don’t have direct internet access. Instead, they go through a proxy, which has a list of allowed sites and resources. So your programs should be able to work with proxies too.
Despite the advantages, using proxies can have downsides. For instance, it might slow down connection speed since requests go through an extra layer. Also, some sites might block access from known proxy servers.
6.2 Proxy and the requests
module
The requests
library supports using proxy servers with the proxies
parameter.
HTTP (Hypertext Transfer Protocol) and HTTPS (HTTP Secure) are protocols for data transmission on the internet. HTTPS is a secure version of HTTP. Different proxies can be used for them since they may require different handling due to HTTPS encryption features.
Example of using an HTTP proxy
Instead of passing one proxy, you typically pass a list. It's handy if some proxies get banned or become unavailable.
Example of calling the requests.get()
function with a request via a proxy.
import requests
# URL to make a request to
url = 'http://httpbin.org/ip'
# Proxy server settings
proxies = {
'http': 'http://10.10.1.10:3128',
'https': 'http://10.10.1.10:1080',
}
# Sending a GET request via proxy
response = requests.get(url, proxies=proxies)
print(response.json())
HTTP requests will go through the first proxy server, while HTTPS through the second.
Example of using a proxy with authentication
Many proxy servers require authentication before letting you use them. There's something cool for this...
When URLs were invented, they included a way to pass login and password for resources directly in the URL. It looks like this:
http://user:password@domain/path
So if a proxy server requires authentication, you can include credentials in the URL.
Example:
import requests
# URL to make a request to
url = 'http://httpbin.org/ip'
# Proxy server settings with authentication
proxies = {
'http': 'http://user:password@10.10.1.10:3128',
'https': 'http://user:password@10.10.1.10:1080',
}
# Sending a GET request via proxy
response = requests.get(url, proxies=proxies)
print(response.json())
I haven't seen this used in practice, but if you're setting up your test proxy server, why not.
However, note that passing login and password in the URL might not be secure since URLs can be saved in browser history or server logs. In real applications, use more secure authentication methods.
It's crucial to securely store credentials for proxy servers in real applications. Never store passwords in plain text in code or config files. Instead, use environment variables or secure secret stores.
6.3 Proxy and http.client
To work with proxies using the http.client
module, you need to set the connection and request headers manually.
You simply need to specify the host
and port when creating the connection.
Example:
# Proxy server settings
proxy_host = '10.10.1.10'
proxy_port = 3128
# Creating a connection with the proxy server
conn = http.client.HTTPConnection(proxy_host, proxy_port)
Then you need to establish a tunnel with the proxy server before sending a request:
dest_url = 'httpbin.org'
dest_path = '/ip'
# Setting up and sending the request
conn.set_tunnel(dest_url)
conn.request('GET', dest_path)
To check if the proxy server works correctly, you can compare your IP address before and after using the proxy. You can use services showing your current IP, like httpbin.org/ip.
It's "easy". The complete example of using an HTTP proxy with http.client
looks like this:
import http.client
# Proxy server settings
proxy_host = '10.10.1.10'
proxy_port = 3128
dest_url = 'httpbin.org'
dest_path = '/ip'
# Creating a connection with the proxy server
conn = http.client.HTTPConnection(proxy_host, proxy_port)
# Setting up and sending the request
conn.set_tunnel(dest_url)
conn.request('GET', dest_path)
# Getting the response
response = conn.getresponse()
print(response.status, response.reason)
print(response.read().decode('utf-8'))
# Closing the connection
conn.close()
What can I say? Using the requests
module is going to be easier, of course. But! Many modules and frameworks use the low-level http.client
under the hood. You need to know how to work with it, so you can correctly configure their operation.
Despite the advantages, using proxies can have downsides. For instance, it might slow down connection speed since requests go through an extra layer. Also, some sites might block access from known proxy servers. So, when using a proxy, always consider both its advantages and potential limitations.
GO TO FULL VERSION