CodeGym /Courses /Python SELF EN /Hash Functions and Their Applications

Hash Functions and Their Applications

Python SELF EN
Level 54 , Lesson 5
Available

10.1 Introduction to Hash Functions and Hash Tables

Let's recall and deepen our understanding of hash functions and tables. Hash function is an algorithm that converts input data of arbitrary length into a fixed-length string. Key properties of hash functions:

  • Deterministic: The same input will always produce the same output.
  • Fast computation: The hash should be computed quickly.
  • Irreversibility (for cryptographic hash functions): It should be impossible (or extremely hard) to reconstruct the original data from the hash.
  • Uniform distribution: Small changes in input data should result in significant changes in the hash.

To put it simply, this function is the same for identical objects, but if the hash function is the same for two objects, these objects aren't necessarily equal. In mathematics, this is called a necessary but not sufficient condition.

Hash table is a data structure that uses a hash function for efficient storage and retrieval of information. It consists of:

  • An array of "buckets" to store the data.
  • A hash function that determines which bucket to place the data in.

Hash tables provide fast data access, typically with O(1) complexity in the average case.

Real-life Applications of Hash Functions

Example: Blockchain Technology

In blockchain, hash functions are used to create unique block identifiers and ensure data integrity. Each block contains the hash of the previous block, creating a chain that makes the system resistant to changes.


import hashlib
import time

class Block:
    def __init__(self, data, previous_hash):
        self.timestamp = time.time()
        self.data = data
        self.previous_hash = previous_hash
        self.hash = self.calculate_hash()

    def calculate_hash(self):
        hash_string = str(self.timestamp) + str(self.data) + str(self.previous_hash)
        return hashlib.sha256(hash_string.encode()).hexdigest()

# Example of usage
block1 = Block("Transaction 1", "0")
block2 = Block("Transaction 2", block1.hash)
print(f"Block 1 hash: {block1.hash}")
print(f"Block 2 hash: {block2.hash}")

Performance Comparison

Let's consider the problem of finding duplicates in an array. We'll compare a solution using a hash table and one without:


import time

def find_duplicates_with_hash(arr):
    seen = set()
    duplicates = []
    for item in arr:
        if item in seen:
            duplicates.append(item)
        else:
            seen.add(item)
    return duplicates

def find_duplicates_without_hash(arr):
    duplicates = []
    for i in range(len(arr)):
        for j in range(i+1, len(arr)):
            if arr[i] == arr[j] and arr[i] not in duplicates:
                duplicates.append(arr[i])
    return duplicates

# Performance test
arr = list(range(10000)) + list(range(5000))  # Array with duplicates

start = time.time()
find_duplicates_with_hash(arr)
end = time.time()
print(f"Execution time with hash table: {end - start} seconds")

start = time.time()
find_duplicates_without_hash(arr)
end = time.time()
print(f"Execution time without hash table: {end - start} seconds")

Run the program to see how the use of a hash table speeds up finding duplicates, especially in large arrays.

10.2 Examples of Hash Function Applications in Real Tasks

1. Password Hashing

Hash functions are used for securely storing passwords. Instead of storing passwords in plain text, systems store their hashes. When a user enters a password, the system hashes the input and compares it to the stored hash.

Sample implementation:


import hashlib

def hash_password(password):
    return hashlib.sha256(password.encode()).hexdigest()

# Example of usage:
password = "securepassword"
hashed_password = hash_password(password)
print(f"Password hash: {hashed_password}")

2. Data Integrity Check

Hash functions are used to check the integrity of files and data, for example, to check whether a file has been changed or corrupted during transmission.

Sample implementation:


import hashlib

def get_file_hash(file_path):
    hasher = hashlib.sha256()
    with open(file_path, 'rb') as file:
        buf = file.read()
        hasher.update(buf)
    return hasher.hexdigest()

# Example of usage:
file_hash = get_file_hash('example.txt')
print(f"SHA-256 file hash: {file_hash}")

3. Search Engines and Indexing

Search engines use hash functions to create indexes and quickly search for information. Each document is indexed by keywords, and hash functions help quickly find documents containing specified words.

Sample implementation:


def create_index(text):
    index = {}
    words = text.split()
    for word in words:
        word_hash = hash(word)
        if word_hash not in index:
            index[word_hash] = []
        index[word_hash].append(word)
    return index

# Example of usage:
text = "This is an example text for indexing"
index = create_index(text)
print(f"Index: {index}")

10.3 Optimizing Search with Hash Functions

1. Using Hash Tables to Find Duplicates

Hash tables allow for quick identification of duplicates in an array by comparing hash values of elements.

Sample implementation:


def find_duplicates(arr):
    seen = set()
    duplicates = []
    for item in arr:
        if item in seen:
            duplicates.append(item)
        else:
            seen.add(item)
    return duplicates

# Example of usage:
arr = [1, 2, 3, 2, 4, 5, 6, 4, 7]
print(find_duplicates(arr))  # Output: [2, 4]

2. Optimizing Search for Pairs with a Given Sum

Hash tables allow efficient discovery of pairs of numbers in an array whose sum equals a given value.

Sample implementation:


def find_pairs_with_sum(arr, target_sum):
    seen = set()
    pairs = []

    for num in arr:
        complement = target_sum - num
        if complement in seen:
            pairs.append((complement, num))
        seen.add(num)

    return pairs

# Example of usage:
arr = [2, 4, 3, 7, 8, -2, 10, -1]
target_sum = 6
print(find_pairs_with_sum(arr, target_sum))  # Output: [(4, 2), (3, 3), (8, -2)]

10.4 Examples of Using Hash Functions in Various Algorithms

1. Counting the Number of Words in a Text

Use a hash table to count the occurrences of each word in a text.

Sample implementation:


def count_words(text):
    word_count = {}
    words = text.split()

    for word in words:
        if word in word_count:
            word_count[word] += 1
        else:
            word_count[word] = 1

    return word_count

# Example of usage:
text = "this is a test this is only a test"
print(count_words(text))  # Output: {'this': 2, 'is': 2, 'a': 2, 'test': 2, 'only': 1}

2. Checking the Intersection of Two Arrays

Check if two arrays intersect (have at least one common element).

Sample implementation:


def has_intersection(arr1, arr2):
    set1 = set(arr1)
    for item in arr2:
        if item in set1:
            return True
    return False

# Example of usage:
arr1 = [1, 2, 3, 4]
arr2 = [3, 5, 6, 7]
arr3 = [8, 9, 10]
print(has_intersection(arr1, arr2))  # Output: True
print(has_intersection(arr1, arr3))  # Output: False

3. Checking for Unique Elements in an Array

Check if an array contains only unique elements.

Sample implementation:


def all_unique(arr):
    seen = set()
    for item in arr:
        if item in seen:
            return False
        seen.add(item)
    return True

# Example of usage:
arr1 = [1, 2, 3, 4, 5]
arr2 = [1, 2, 3, 4, 5, 3]
print(all_unique(arr1))  # Output: True
print(all_unique(arr2))  # Output: False
2
Task
Python SELF EN, level 54, lesson 5
Locked
Password Hashing
Password Hashing
2
Task
Python SELF EN, level 54, lesson 5
Locked
Authorization
Authorization
1
Survey/quiz
Hashing, level 54, lesson 5
Unavailable
Hashing
Hashing
Comments
TO VIEW ALL COMMENTS OR TO MAKE A COMMENT,
GO TO FULL VERSION