A hashmap is essentially a smart filing system built on an array. It stores all its data as key-value pairs, where each key
serves as a unique address for its corresponding value
. To organize this data efficiently, a hashmap uses a hash function which maps each pair to a specific location in the array. This setup allows you to quickly retrieve any value using its key, without searching the entire dataset.
Hashmaps offer quick data operations. The hash function decides where to store each item, making access super fast. Keep in mind, hashmaps are unordered, so they don't work well for sorted data. Sometimes, collisions happen when two keys end up in the same spot, but good hash functions help avoid this problem.
Operations | Time | Space |
---|---|---|
Search | O(1)* | O(1) |
Access | O(1)* | O(1) |
Insert | O(1)* | O(1) |
Delete | O(1)* | O(1) |
Resize | O(n) | O(n) |
Get Keys | O(n) | O(n) |
Get Values | O(n) | O(n) |
Key-value pairs are the building blocks of hashmaps. A key
is a unique label that points to some value
(data). Think of a contact list: a person's name (the key
) leads to their phone number (the value
). Remember, keys are always unique, but values can repeat. The hash function uses keys to quickly store and find values in the hashmap.
Hashmaps rely on hash functions. These special functions convert any input into an integer. This integer, called a hash code, serves as an index to store the input value in the hashmap. By using hash codes, hashmaps can quickly insert and retrieve data. They directly access the storage location without searching through all the data, making operations incredibly fast.
# Hash Function: Converts key (string) into an index
def hash_function(key: str, size: int) -> int:
hash_value = 0
for index, char in enumerate(key):
# Use position-dependent weighting
hash_value += (index + 1) * ord(char)
# Apply a prime multiplier for better distribution
hash_value *= 2069
# Ensure index is within the size range
return hash_value % size
The hash function shown above demonstrates how to convert a key
(like an integer or string) into an index for storing or finding data in a hashmap. It's important to apply the modulo operator against the hashmap's sizeto ensure the index falls within bounds. A well-designed hash function spreads keys evenly across the hashmap, minimizing collisions. Even distribution is the key to maintaining a hashmap's efficiency for quick operations.
Collisions occur when different keys hash to the same index. Imagine two students mistakenly assigned the same seat in a classroom - that's a collision in a hashmap. To handle these situations without losing data, hashmaps use strategies like chaining or open addressing. These methods ensure all information remains accessible, even when multiple keys point to the same location in the hashmap.
As a hashmap grows, it may need to resize and rehash to maintain efficiency. This process is like moving to a bigger house when your family expands. When the load factor reaches a certain threshold (typically 0.75
), the hashmap increases its size and redistributes its elements. This involves creating a larger array, then recalculating the hash for each key-value pair and placing them in their new positions. While this operation can be time-consuming, it's important for maintaining good performance as the hashmap grows:
def resize(hashmap):
# Double the size of the current hashmap
size = len(hashmap) * 2
# Use the doubled size to create a new hashmap and return it
new_hashmap = [[] for _ in range(size)]
return new_hashmap
def rehash(hashmap):
# Use the "resize" function to generate a larger empty hashmap
new_hashmap = resize(hashmap)
for bucket in hashmap:
for key, value in bucket:
# Calculate the new index for each key-value pair
new_index = hash(key) % len(new_hashmap)
# Add the key-value pair to the corresponding bucket in the new hashmap
new_hashmap[new_index].append([key, value])
return new_hashmap
Separate chaining is a simple way to handle collisions in hashmaps. When multiple keys end up at the same index, chaining links the data together. In this method, each index holds a "chain" (which is actually a linked list). This chain stores all entries that belong to that index, allowing multiple items to share the same location and resolving collisions.
from typing import Optional
# Definition of a singly-linked ListNode
class ListNode:
def __init__(self, key: int = -1, val: int = -1, next: Optional['ListNode'] = None):
self.key = key
self.val = val
self.next = next
class MyHashMap:
def __init__(self):
# Our hashmap is just an array containing "size" number of ListNodes
self.size = 1000
self.hashmap = [ListNode() for i in range(self.size)]
def hash_function(self, key: int) -> int:
return key % self.size
def put(self, key: int, value: int) -> None:
# Hash the key and find the corresponding ListNode
index = self.hash_function(key)
cur = self.hashmap[index]
while cur.next:
# If we find the key, then rewrite the value
if cur.next.key == key:
cur.next.val = value
return
cur = cur.next
# If we don't find the key, then add a new ListNode
cur.next = ListNode(key, value)
def get(self, key: int) -> int:
# Hash the key and find the corresponding ListNode
index = self.hash_function(key)
cur = self.hashmap[index]
while cur.next:
# If we find the key, then return the value
if cur.next.key == key:
return cur.next.val
cur = cur.next
# If we don't find the key, then return -1
return -1
def remove(self, key: int) -> None:
# Hash the key and find the corresponding ListNode
index = self.hash_function(key)
cur = self.hashmap[index]
while cur.next:
# If we find the key, then remove that ListNode
if cur.next.key == key:
cur.next = cur.next.next
return
cur = cur.next
When working with a chained hashmap, we follow these steps for different operations:
put
): calculate the index using the hash_function
. Then, add the new key-value pair to the end of the linked list at that indexget
): Compute the index using the hash_function
, then search through the linked list at that index to find the desired keyremove
): Follow the same process as retrieval, but remove the node from the linked list once foundOpen addressing is a method for handling collisions in hashmaps. Unlike chaining, which uses linked lists at each index, open addressing keeps only one element per index. When a collision occurs, the hashmap searches for the next available slot to place the new key-value pair. This approach quickly finds alternative spots within the hashmap for overlapping elements:
class MyHashMap:
def __init__(self):
# Initialize the hashmap with a fixed size
self.size = 80_000
self.hashmap = [None] * self.size
def hash_function(self, key: int) -> int:
# Simple hash function to determine the index
return key % self.size
def probe(self, index: int) -> int:
# Linear probing: move to the next index, wrapping around if necessary
return (index + 1) % self.size
def put(self, key: int, value: int) -> None:
# Hash the key to find the initial index
index = self.hash_function(key)
while self.hashmap[index] is not None:
if self.hashmap[index][0] == key:
# If we find the key, update the value
self.hashmap[index] = (key, value)
return
# Collision occurred, try the next slot
index = self.probe(index)
# Empty slot found, insert the new key-value pair
self.hashmap[index] = (key, value)
def get(self, key: int) -> int:
# Hash the key to find the initial index
index = self.hash_function(key)
while self.hashmap[index] is not None:
if self.hashmap[index][0] == key:
# If we find the key, return the value or try the next slot
return self.hashmap[index][1]
index = self.probe(index)
# Key not found after probing all slots
return -1
def remove(self, key: int) -> None:
# Hash the key to find the initial index
index = self.hash_function(key)
while self.hashmap[index] is not None:
if self.hashmap[index][0] == key:
# If we find the key, mark the slot as deleted
self.hashmap[index] = None
return
# If we didn't find the key, try the next slot
index = self.probe(index)
Open addressing uses various methods to find available indices when a collision occurs:
Sets, also known as hash sets, are data structures that use hashing to store unique elements. Unlike hash maps that store key-value pairs, sets only store individual values and prevent duplicates. With their fast insertion, deletion, and search operations, sets are great for scenarios where uniqueness or quick data access matters.
Operations | Time | Space |
---|---|---|
Add | O(1)* | O(1) |
Remove | O(1)* | O(1) |
Search | O(1)* | O(1) |
Size | O(1) | O(1) |
Iterate | O(n) | O(1) |
Clear | O(1) | O(1) |
Resize | O(n) | O(n) |
Union | O(n+m) | O(n+m) |
Intersection | O(min(n,m)) | O(min(n,m)) |
Difference | O(n) | O(n) |
Sets can be implemented using the same principles as hashmaps. We can create an efficient set structure using chaining with linked lists. The core of this implementation is the ListNode
class, which stores a unique element and a pointer to the next node. This allows us to form chains of distinct elements at each index of the hashset. The HashSet
class is initialized with a specific number of buckets. Each bucket starts with a dummy ListNode
:
# Definition of a singly-linked ListNode
class ListNode:
def __init__(self, key: int = -1, next: Optional['ListNode'] = None):
self.key = key
self.next = next
class MyHashSet:
def __init__(self):
# Our hash set is just an array containing "size" number of ListNodes
self.size = 1000
self.hashset = [ListNode() for _ in range(self.size)]
def hash_function(self, key: int) -> int:
return key % self.size
def add(self, key: int) -> None:
# Hash the key and find the corresponding ListNode
index = self.hash_function(key)
cur = self.hashset[index]
while cur.next:
# If we find the key, then it's already in the set
if cur.next.key == key:
return
cur = cur.next
# If we don't find the key, then add a new ListNode
cur.next = ListNode(key)
def contains(self, key: int) -> bool:
# Hash the key and find the corresponding ListNode
index = self.hash_function(key)
cur = self.hashset[index]
while cur.next:
# If we find the key, then it's in the set
if cur.next.key == key:
return True
cur = cur.next
# If we don't find the key, then it's not in the set
return False
def remove(self, key: int) -> None:
# Hash the key and find the corresponding ListNode
index = self.hash_function(key)
cur = self.hashset[index]
while cur.next:
# If we find the key, then remove that ListNode
if cur.next.key == key:
cur.next = cur.next.next
return
cur = cur.next
Hashmaps and sets are the most important data structures to learn if you ever want to see an optimal solution. Here are some best practices for using hashmaps and sets effectively for problems: