10.1 Introduction to Hash Functions and Hash Tables
Let's recall and deepen our understanding of hash functions and tables. Hash function is an algorithm that converts input data of arbitrary length into a fixed-length string. Key properties of hash functions:
- Deterministic: The same input will always produce the same output.
- Fast computation: The hash should be computed quickly.
- Irreversibility (for cryptographic hash functions): It should be impossible (or extremely hard) to reconstruct the original data from the hash.
- Uniform distribution: Small changes in input data should result in significant changes in the hash.
To put it simply, this function is the same for identical objects, but if the hash function is the same for two objects, these objects aren't necessarily equal. In mathematics, this is called a necessary but not sufficient condition.
Hash table is a data structure that uses a hash function for efficient storage and retrieval of information. It consists of:
- An array of "buckets" to store the data.
- A hash function that determines which bucket to place the data in.
Hash tables provide fast data access, typically with O(1) complexity in the average case.
Real-life Applications of Hash Functions
Example: Blockchain Technology
In blockchain, hash functions are used to create unique block identifiers and ensure data integrity. Each block contains the hash of the previous block, creating a chain that makes the system resistant to changes.
import hashlib
import time
class Block:
def __init__(self, data, previous_hash):
self.timestamp = time.time()
self.data = data
self.previous_hash = previous_hash
self.hash = self.calculate_hash()
def calculate_hash(self):
hash_string = str(self.timestamp) + str(self.data) + str(self.previous_hash)
return hashlib.sha256(hash_string.encode()).hexdigest()
# Example of usage
block1 = Block("Transaction 1", "0")
block2 = Block("Transaction 2", block1.hash)
print(f"Block 1 hash: {block1.hash}")
print(f"Block 2 hash: {block2.hash}")
Performance Comparison
Let's consider the problem of finding duplicates in an array. We'll compare a solution using a hash table and one without:
import time
def find_duplicates_with_hash(arr):
seen = set()
duplicates = []
for item in arr:
if item in seen:
duplicates.append(item)
else:
seen.add(item)
return duplicates
def find_duplicates_without_hash(arr):
duplicates = []
for i in range(len(arr)):
for j in range(i+1, len(arr)):
if arr[i] == arr[j] and arr[i] not in duplicates:
duplicates.append(arr[i])
return duplicates
# Performance test
arr = list(range(10000)) + list(range(5000)) # Array with duplicates
start = time.time()
find_duplicates_with_hash(arr)
end = time.time()
print(f"Execution time with hash table: {end - start} seconds")
start = time.time()
find_duplicates_without_hash(arr)
end = time.time()
print(f"Execution time without hash table: {end - start} seconds")
Run the program to see how the use of a hash table speeds up finding duplicates, especially in large arrays.
10.2 Examples of Hash Function Applications in Real Tasks
1. Password Hashing
Hash functions are used for securely storing passwords. Instead of storing passwords in plain text, systems store their hashes. When a user enters a password, the system hashes the input and compares it to the stored hash.
Sample implementation:
import hashlib
def hash_password(password):
return hashlib.sha256(password.encode()).hexdigest()
# Example of usage:
password = "securepassword"
hashed_password = hash_password(password)
print(f"Password hash: {hashed_password}")
2. Data Integrity Check
Hash functions are used to check the integrity of files and data, for example, to check whether a file has been changed or corrupted during transmission.
Sample implementation:
import hashlib
def get_file_hash(file_path):
hasher = hashlib.sha256()
with open(file_path, 'rb') as file:
buf = file.read()
hasher.update(buf)
return hasher.hexdigest()
# Example of usage:
file_hash = get_file_hash('example.txt')
print(f"SHA-256 file hash: {file_hash}")
3. Search Engines and Indexing
Search engines use hash functions to create indexes and quickly search for information. Each document is indexed by keywords, and hash functions help quickly find documents containing specified words.
Sample implementation:
def create_index(text):
index = {}
words = text.split()
for word in words:
word_hash = hash(word)
if word_hash not in index:
index[word_hash] = []
index[word_hash].append(word)
return index
# Example of usage:
text = "This is an example text for indexing"
index = create_index(text)
print(f"Index: {index}")
10.3 Optimizing Search with Hash Functions
1. Using Hash Tables to Find Duplicates
Hash tables allow for quick identification of duplicates in an array by comparing hash values of elements.
Sample implementation:
def find_duplicates(arr):
seen = set()
duplicates = []
for item in arr:
if item in seen:
duplicates.append(item)
else:
seen.add(item)
return duplicates
# Example of usage:
arr = [1, 2, 3, 2, 4, 5, 6, 4, 7]
print(find_duplicates(arr)) # Output: [2, 4]
2. Optimizing Search for Pairs with a Given Sum
Hash tables allow efficient discovery of pairs of numbers in an array whose sum equals a given value.
Sample implementation:
def find_pairs_with_sum(arr, target_sum):
seen = set()
pairs = []
for num in arr:
complement = target_sum - num
if complement in seen:
pairs.append((complement, num))
seen.add(num)
return pairs
# Example of usage:
arr = [2, 4, 3, 7, 8, -2, 10, -1]
target_sum = 6
print(find_pairs_with_sum(arr, target_sum)) # Output: [(4, 2), (3, 3), (8, -2)]
10.4 Examples of Using Hash Functions in Various Algorithms
1. Counting the Number of Words in a Text
Use a hash table to count the occurrences of each word in a text.
Sample implementation:
def count_words(text):
word_count = {}
words = text.split()
for word in words:
if word in word_count:
word_count[word] += 1
else:
word_count[word] = 1
return word_count
# Example of usage:
text = "this is a test this is only a test"
print(count_words(text)) # Output: {'this': 2, 'is': 2, 'a': 2, 'test': 2, 'only': 1}
2. Checking the Intersection of Two Arrays
Check if two arrays intersect (have at least one common element).
Sample implementation:
def has_intersection(arr1, arr2):
set1 = set(arr1)
for item in arr2:
if item in set1:
return True
return False
# Example of usage:
arr1 = [1, 2, 3, 4]
arr2 = [3, 5, 6, 7]
arr3 = [8, 9, 10]
print(has_intersection(arr1, arr2)) # Output: True
print(has_intersection(arr1, arr3)) # Output: False
3. Checking for Unique Elements in an Array
Check if an array contains only unique elements.
Sample implementation:
def all_unique(arr):
seen = set()
for item in arr:
if item in seen:
return False
seen.add(item)
return True
# Example of usage:
arr1 = [1, 2, 3, 4, 5]
arr2 = [1, 2, 3, 4, 5, 3]
print(all_unique(arr1)) # Output: True
print(all_unique(arr2)) # Output: False
GO TO FULL VERSION