6.1 Hash Table Definition and Structure
A hash table is a data structure that allows for fast lookup, insertion, and deletion of elements using a hash function to compute indexes in an array or a list. Hash tables are typically used to implement associative arrays, also known as dictionaries.
Key Aspects of a Hash Table
Array - The main component of a hash table, it stores pointers to elements or lists of elements.
Hash Function: A function that converts dictionary keys into array indexes.
Collisions: A situation where two different keys have the same hash value. Various methods like chaining or open addressing are used to resolve collisions.
Dictionary example:
{
"apple": 101,
"banana": 102,
"cherry": 103
}
Hash table structure example:
Index | Value |
---|---|
0 | None |
1 | [('apple', 101)] |
2 | None |
3 | [('banana', 102)] |
4 | None |
5 | None |
6 | None |
7 | [('cherry', 103)] |
8 | None |
9 | None |
6.2 Basic Operations in a Hash Table
Basic operations in a hash table: insertion, search, deletion
1. Insertion: The process of adding a new element (key-value pair) to the hash table.
Insertion steps:
- Compute the hash value of the key using the hash function.
- Find the index in the array based on the hash value.
- If there's already an element at that index (collision), add the element to the list (in case of chaining) or find the next available index (in case of open addressing).
Example of inserting into a hash table using chaining:
class HashTable:
def __init__(self, size):
self.size = size
self.table = [None] * size
def hash_function(self, key):
return hash(key) % self.size
def insert(self, key, value):
index = self.hash_function(key)
if self.table[index] is None:
self.table[index] = [(key, value)]
else:
for i, kv in enumerate(self.table[index]):
k, v = kv
if k == key:
self.table[index][i] = (key, value)
return
self.table[index].append((key, value))
# Example usage:
hash_table = HashTable(10)
hash_table.insert("apple", 1)
hash_table.insert("banana", 2)
hash_table.insert("cherry", 3)
print(hash_table.table)
2. Search: The process of finding a value by a given key in the hash table.
Search steps:
- Compute the hash value of the key using the hash function.
- Find the index in the array based on the hash value.
- Check for the key's presence in the list of elements (in case of chaining) or by index (in case of open addressing).
Example of searching in a hash table using chaining:
class HashTable:
def __init__(self, size):
self.size = size
self.table = [None] * size
def hash_function(self, key):
return hash(key) % self.size
def search(self, key):
index = self.hash_function(key)
if self.table[index] is None:
return None
for k, v in self.table[index]:
if k == key:
return v
return None
# Example usage:
hash_table = HashTable(10)
hash_table.insert("apple", 1)
hash_table.insert("banana", 2)
hash_table.insert("cherry", 3)
print(hash_table.search("banana")) # Output: 2
print(hash_table.search("grape")) # Output: None
3. Deletion: The process of removing an element (key-value pair) from the hash table.
Deletion steps:
- Compute the hash value of the key using the hash function.
- Find the index in the array based on the hash value.
- Remove the element from the list (in case of chaining) or set the value at the index to
None
(in case of open addressing).
Example of deleting from a hash table using chaining:
class HashTable:
def __init__(self, size):
self.size = size
self.table = [None] * size
def hash_function(self, key):
return hash(key) % self.size
def insert(self, key, value):
index = self.hash_function(key)
if self.table[index] is None:
self.table[index] = [(key, value)]
else:
for i, kv in enumerate(self.table[index]):
k, v = kv
if k == key:
self.table[index][i] = (key, value)
return
self.table[index].append((key, value))
def delete(self, key):
index = self.hash_function(key)
if self.table[index] is None:
return
for i, kv in enumerate(self.table[index]):
k, v = kv
if k == key:
del self.table[index][i]
return
# Example usage:
hash_table = HashTable(10)
hash_table.insert("apple", 1)
hash_table.insert("banana", 2)
hash_table.insert("cherry", 3)
print(hash_table.table)
hash_table.delete("banana")
print(hash_table.table)
print(hash_table.search("banana")) # Output: None
6.3 Hash Table Time Complexity
Time complexity of operations in a hash table
Insertion:
- Average case:
O(1)
- Worst case:
O(n)
(in case of many collisions or if all items end up in the same spot)
Search:
- Average case:
O(1)
- Worst case:
O(n)
(in case of many collisions or if all items end up in the same spot)
Deletion:
- Average case:
O(1)
- Worst case:
O(n)
(in case of many collisions or if all items end up in the same spot)
Explanation:
Average case: In the average case, the hash function distributes elements evenly across the table, and each element is in its unique cell, which provides constant time access O(1)
.
Worst case: In the worst case, all elements end up in a single cell due to a poor hash function or a large number of collisions. The hash table becomes a linked list, and the time complexity of operations becomes O(n)
.
6.4 Hash Table Usage Examples
1. Implementing a Dictionary (Associative Array)
Hash tables are often used to implement dictionaries, which allow storing key-value pairs and provide fast access by key.
Example:
# Creating a dictionary
dictionary = {}
# Inserting elements
dictionary["apple"] = 1
dictionary["banana"] = 2
dictionary["cherry"] = 3
# Searching for an element
print(dictionary["banana"]) # Output: 2
# Deleting an element
del dictionary["cherry"]
# Checking if a key exists
if "apple" in dictionary:
print("Key 'apple' exists in the dictionary") # Output: Key 'apple' exists in the dictionary
Now you know a bit more about a Python dictionary and how it works.
2. Caching Computation Results
Hash tables are used to cache the results of expensive computations for faster subsequent requests.
Example:
# Cache for storing results
cache = {}
def expensive_computation(x):
if x in cache:
return cache[x]
result = x * x # Example of an expensive computation
cache[x] = result
return result
# Using the cache
print(expensive_computation(10)) # Output: 100 (computed and cached)
print(expensive_computation(10)) # Output: 100 (from cache)
3. Word Frequency Count in a Text
Hash tables are used to count occurrences of each word in a text.
Example:
from collections import defaultdict
text = "this is a simple text with some simple words this is simple"
word_count = defaultdict(int)
for word in text.split():
word_count[word] += 1
# Outputting results
for word, count in word_count.items():
print(f"Word '{word}' occurs {count} time(s)")
4. Finding Duplicates in a List
Hash tables are used for efficient duplicate finding in a list.
def find_duplicates(arr):
seen = set()
duplicates = []
for item in arr:
if item in seen:
duplicates.append(item)
else:
seen.add(item)
return duplicates
# Example usage
arr = [1, 2, 3, 2, 4, 5, 6, 4, 7]
print(find_duplicates(arr)) # Output: [2, 4]
GO TO FULL VERSION