7.1 Defining Collisions in a Hash Table
A collision in a hash table occurs when two different keys hash to the same array index. This results in more than one element attempting to occupy the same cell in the hash table.

Example of a collision:
Let's say we have a simple hash function that returns the remainder of dividing the key by the table size. For keys 5 and 15 with a table size of 10, both keys will produce the same index: 5 % 10 = 5
and 15 % 10 = 5
. That's a collision.
Another example:
Suppose you're compiling a name dictionary, and you have a simple hash function that returns the first letter of the name. But it turns out 80% of the names start with the letter "A". That's not just a collision, it's a problem.
7.2 Collision Resolution Methods – Chaining
There are several methods for resolving collisions in hash tables. The two most common methods are chaining and open addressing.

Chaining
Advantages:
- Simple to implement.
- Efficient when there are few collisions.
- Less dependent on the table's load factor.
Disadvantages:
- Requires extra memory for pointers in the linked list.
- Performance can drop with many collisions due to long chains.
Example of chaining method implementation:
You can just study the insert function's feature, but I decided to provide the full code for convenience.
class HashTableChaining:
def __init__(self, size):
self.size = size
self.table = [None] * size
def hash_function(self, key):
return hash(key) % self.size
def insert(self, key, value):
index = self.hash_function(key)
if self.table[index] is None:
self.table[index] = [(key, value)]
else:
for i, kv in enumerate(self.table[index]):
k, v = kv
if k == key:
self.table[index][i] = (key, value)
return
self.table[index].append((key, value))
def search(self, key):
index = self.hash_function(key)
if self.table[index] is None:
return None
for k, v in self.table[index]:
if k == key:
return v
return None
def delete(self, key):
index = self.hash_function(key)
if self.table[index] is None:
return
for i, kv in enumerate(self.table[index]):
k, v = kv
if k == key:
del self.table[index][i]
return
# Example usage:
hash_table = HashTableChaining(10)
hash_table.insert("apple", 1)
hash_table.insert("banana", 2)
hash_table.insert("cherry", 3)
print(hash_table.search("banana")) # Output: 2
hash_table.delete("banana")
print(hash_table.search("banana")) # Output: None
7.3 Method Two – Open Addressing
The open addressing method involves finding the next available cell in the array if a collision occurs. There are different strategies for finding a new cell: linear probing, quadratic probing, and double hashing.

Advantages:
- Does not require additional memory for storing pointers.
- More compact data representation.
Disadvantages:
- Performance can significantly drop with a high table load factor.
- Requires more computational resources for collision resolution.
Examples of Open Addressing Strategies
Linear Probing:
- Probing with step 1:
index = (index + 1) % size
. - Simple, but can lead to "clustering".
Quadratic Probing:
- Probing with a step increasing by square:
index = (index + i^2) % size
. - Reduces clustering issue but is more complex to implement.
Double Hashing:
- Using a second hash function to compute the step:
index = (index + i * hash2(key)) % size
. - Fewer clustering issues, but more complex to implement.
Example of open addressing method implementation (linear probing): I'll provide full working code; for simplicity, you can examine only the insert function
class HashTableOpenAddressing:
def __init__(self, size):
self.size = size
self.table = [None] * size
def hash_function(self, key):
return hash(key) % self.size
def insert(self, key, value):
index = self.hash_function(key)
original_index = index
while self.table[index] is not None:
if self.table[index][0] == key:
self.table[index] = (key, value)
return
index = (index + 1) % self.size
if index == original_index:
raise Exception("Hash table is full")
self.table[index] = (key, value)
def search(self, key):
index = self.hash_function(key)
original_index = index
while self.table[index] is not None:
if self.table[index][0] == key:
return self.table[index][1]
index = (index + 1) % self.size
if index == original_index:
break
return None
def delete(self, key):
index = self.hash_function(key)
original_index = index
while self.table[index] is not None:
if self.table[index][0] == key:
self.table[index] = None
return
index = (index + 1) % self.size
if index == original_index:
break
# Example usage:
hash_table = HashTableOpenAddressing(10)
hash_table.insert("apple", 1)
hash_table.insert("banana", 2)
hash_table.insert("cherry", 3)
print(hash_table.search("banana")) # Output: 2
hash_table.delete("banana")
print(hash_table.search("banana")) # Output: None
7.4 Impact of Collisions on a Hash Table
Impact of collisions on performance.
Average Access Time:
- In ideal conditions (with no collisions), the average access time to elements in a hash table is
O(1)
. - Collisions increase access time because processing chains or probing cells requires more operations.
Increasing Chain Length:
- In the chaining method, increasing chain length with many collisions can lead to linear access time
O(n)
for search, insert, and delete operations. - Long chains increase the number of comparisons needed to find or insert an element.
Space Complexity:
- In the chaining method, additional memory is required to store pointers in linked lists.
- In the open addressing method, filling the table above a certain level (usually around 70-80%) significantly increases the number of collisions and thus access time.
Clustering: In the open addressing method (especially with linear probing), clustering occurs when several consecutive cells become occupied, increasing collision likelihood and degrading performance.
Example of Analyzing Collisions' Impact on Performance:
This is for those who love to dig deeper
import time
import random
# Linear Probing
class HashTableOpenAddressing:
def __init__(self, size):
self.size = size
self.table = [None] * size
def hash_function(self, key):
return hash(key) % self.size
def insert(self, key, value):
index = self.hash_function(key)
original_index = index
while self.table[index] is not None:
if self.table[index][0] == key:
self.table[index] = (key, value)
return
index = (index + 1) % self.size
if index == original_index:
raise Exception("Hash table is full")
self.table[index] = (key, value)
def search(self, key):
index = self.hash_function(key)
original_index = index
while self.table[index] is not None:
if self.table[index][0] == key:
return self.table[index][1]
index = (index + 1) % self.size
if index == original_index:
break
return None
# Generate a large amount of data for insertion
hash_table = HashTableOpenAddressing(1000)
keys = [random.randint(1, 10000) for _ in range(800)]
# Insert data and measure time
start_time = time.time()
for key in keys:
hash_table.insert(key, key)
print(f"Insert time: {time.time() - start_time:.6f} seconds")
# Search data and measure time
start_time = time.time()
for key in keys:
hash_table.search(key)
print(f"Search time: {time.time() - start_time:.6f} seconds")
GO TO FULL VERSION