CodeGym /Courses /Python SELF EN /Collision Problems and Their Solutions

Collision Problems and Their Solutions

Python SELF EN
Level 54 , Lesson 2
Available

7.1 Defining Collisions in a Hash Table

A collision in a hash table occurs when two different keys hash to the same array index. This results in more than one element attempting to occupy the same cell in the hash table.

Example of a collision:

Let's say we have a simple hash function that returns the remainder of dividing the key by the table size. For keys 5 and 15 with a table size of 10, both keys will produce the same index: 5 % 10 = 5 and 15 % 10 = 5. That's a collision.

Another example:

Suppose you're compiling a name dictionary, and you have a simple hash function that returns the first letter of the name. But it turns out 80% of the names start with the letter "A". That's not just a collision, it's a problem.

7.2 Collision Resolution Methods – Chaining

There are several methods for resolving collisions in hash tables. The two most common methods are chaining and open addressing.

Chaining

Advantages:

  • Simple to implement.
  • Efficient when there are few collisions.
  • Less dependent on the table's load factor.

Disadvantages:

  • Requires extra memory for pointers in the linked list.
  • Performance can drop with many collisions due to long chains.

Example of chaining method implementation:

You can just study the insert function's feature, but I decided to provide the full code for convenience.


class HashTableChaining:
    def __init__(self, size):
        self.size = size
        self.table = [None] * size

    def hash_function(self, key):
        return hash(key) % self.size

    def insert(self, key, value):
        index = self.hash_function(key)
        if self.table[index] is None:
            self.table[index] = [(key, value)]
        else:
            for i, kv in enumerate(self.table[index]):
                k, v = kv
                if k == key:
                    self.table[index][i] = (key, value)
                    return
            self.table[index].append((key, value))

    def search(self, key):
        index = self.hash_function(key)
        if self.table[index] is None:
            return None
        for k, v in self.table[index]:
            if k == key:
                return v
        return None

    def delete(self, key):
        index = self.hash_function(key)
        if self.table[index] is None:
            return
        for i, kv in enumerate(self.table[index]):
            k, v = kv
            if k == key:
                del self.table[index][i]
                return

# Example usage:
hash_table = HashTableChaining(10)
hash_table.insert("apple", 1)
hash_table.insert("banana", 2)
hash_table.insert("cherry", 3)
print(hash_table.search("banana"))  # Output: 2
hash_table.delete("banana")
print(hash_table.search("banana"))  # Output: None

7.3 Method Two – Open Addressing

The open addressing method involves finding the next available cell in the array if a collision occurs. There are different strategies for finding a new cell: linear probing, quadratic probing, and double hashing.

Advantages:

  • Does not require additional memory for storing pointers.
  • More compact data representation.

Disadvantages:

  • Performance can significantly drop with a high table load factor.
  • Requires more computational resources for collision resolution.

Examples of Open Addressing Strategies

Linear Probing:

  • Probing with step 1: index = (index + 1) % size.
  • Simple, but can lead to "clustering".

Quadratic Probing:

  • Probing with a step increasing by square: index = (index + i^2) % size.
  • Reduces clustering issue but is more complex to implement.

Double Hashing:

  • Using a second hash function to compute the step: index = (index + i * hash2(key)) % size.
  • Fewer clustering issues, but more complex to implement.

Example of open addressing method implementation (linear probing): I'll provide full working code; for simplicity, you can examine only the insert function


class HashTableOpenAddressing:
    def __init__(self, size):
        self.size = size
        self.table = [None] * size

    def hash_function(self, key):
        return hash(key) % self.size

    def insert(self, key, value):
        index = self.hash_function(key)
        original_index = index
        while self.table[index] is not None:
            if self.table[index][0] == key:
                self.table[index] = (key, value)
                return
            index = (index + 1) % self.size
            if index == original_index:
                raise Exception("Hash table is full")
        self.table[index] = (key, value)

    def search(self, key):
        index = self.hash_function(key)
        original_index = index
        while self.table[index] is not None:
            if self.table[index][0] == key:
                return self.table[index][1]
            index = (index + 1) % self.size
            if index == original_index:
                break
        return None

    def delete(self, key):
        index = self.hash_function(key)
        original_index = index
        while self.table[index] is not None:
            if self.table[index][0] == key:
                self.table[index] = None
                return
            index = (index + 1) % self.size
            if index == original_index:
                break

# Example usage:
hash_table = HashTableOpenAddressing(10)
hash_table.insert("apple", 1)
hash_table.insert("banana", 2)
hash_table.insert("cherry", 3)
print(hash_table.search("banana"))  # Output: 2
hash_table.delete("banana")
print(hash_table.search("banana"))  # Output: None

7.4 Impact of Collisions on a Hash Table

Impact of collisions on performance.

Average Access Time:

  • In ideal conditions (with no collisions), the average access time to elements in a hash table is O(1).
  • Collisions increase access time because processing chains or probing cells requires more operations.

Increasing Chain Length:

  • In the chaining method, increasing chain length with many collisions can lead to linear access time O(n) for search, insert, and delete operations.
  • Long chains increase the number of comparisons needed to find or insert an element.

Space Complexity:

  • In the chaining method, additional memory is required to store pointers in linked lists.
  • In the open addressing method, filling the table above a certain level (usually around 70-80%) significantly increases the number of collisions and thus access time.

Clustering: In the open addressing method (especially with linear probing), clustering occurs when several consecutive cells become occupied, increasing collision likelihood and degrading performance.

Example of Analyzing Collisions' Impact on Performance:

This is for those who love to dig deeper


import time
import random

# Linear Probing
class HashTableOpenAddressing:
    def __init__(self, size):
        self.size = size
        self.table = [None] * size

    def hash_function(self, key):
        return hash(key) % self.size

    def insert(self, key, value):
        index = self.hash_function(key)
        original_index = index
        while self.table[index] is not None:
            if self.table[index][0] == key:
                self.table[index] = (key, value)
                return
            index = (index + 1) % self.size
            if index == original_index:
                raise Exception("Hash table is full")
        self.table[index] = (key, value)

    def search(self, key):
        index = self.hash_function(key)
        original_index = index
        while self.table[index] is not None:
            if self.table[index][0] == key:
                return self.table[index][1]
            index = (index + 1) % self.size
            if index == original_index:
                break
        return None

# Generate a large amount of data for insertion
hash_table = HashTableOpenAddressing(1000)
keys = [random.randint(1, 10000) for _ in range(800)]

# Insert data and measure time
start_time = time.time()
for key in keys:
    hash_table.insert(key, key)
print(f"Insert time: {time.time() - start_time:.6f} seconds")

# Search data and measure time
start_time = time.time()
for key in keys:
    hash_table.search(key)
print(f"Search time: {time.time() - start_time:.6f} seconds")

2
Task
Python SELF EN, level 54, lesson 2
Locked
Collision-free Hash Table
Collision-free Hash Table
2
Task
Python SELF EN, level 54, lesson 2
Locked
Hash Table with Chaining
Hash Table with Chaining
Comments
TO VIEW ALL COMMENTS OR TO MAKE A COMMENT,
GO TO FULL VERSION