A hashmap is a smart filing system built on an array. It stores data as key-value pairs, where each key is the unique address for its value. A hash function maps each pair to a specific spot in the array, so you can look up any value by its key without searching the entire dataset.
Hashmap operations are very fast. The hash function decides where to store each item, so you can jump straight to it. The catch: hashmaps are unordered, so they don't work well when you need sorted data. Sometimes collisions happen when two keys land at the same spot, but good hash functions keep these rare.
| Operations | Time | Space |
|---|---|---|
| Search | O(1)* | O(1) |
| Access | O(1)* | O(1) |
| Insert | O(1)* | O(1) |
| Delete | O(1)* | O(1) |
| Resize | O(n) | O(n) |
| Get Keys | O(n) | O(n) |
| Get Values | O(n) | O(n) |
Key-value pairs are the building blocks of a hashmap. A key is a unique label that points to a value. Think of a contact list: a person's name (the key) maps to their phone number (the value). Keys are always unique, but values can repeat. The hash function uses keys to figure out where each value lives.
Hashmaps rely on hash functions to do their work. A hash function takes any input and converts it into an integer called a hash code. That integer is used as the index where the value gets stored. Because hashing gives you the index directly, the hashmap can jump straight to the data instead of scanning every entry.
# Hash Function: Converts key (string) into an index
def hash_function(key: str, size: int) -> int:
hash_value = 0
for index, char in enumerate(key):
# Use position-dependent weighting
hash_value += (index + 1) * ord(char)
# Apply a prime multiplier for better distribution
hash_value *= 2069
# Ensure index is within the size range
return hash_value % size
The hash function above shows how to turn a key (a number or a string) into an index for storing or looking up data. The modulo operator against the hashmap's size keeps the index within bounds. A good hash function spreads keys evenly across the table, so fewer collisions happen, which is what keeps hashmaps fast.
Collisions happen when different keys hash to the same index. Imagine two students accidentally assigned the same seat in a classroom. That's a collision in a hashmap. To handle them without losing data, hashmaps use strategies like chaining or open addressing. Both make sure every value stays reachable, even when multiple keys land in the same slot.
As a hashmap fills up, it eventually has to resize and rehash to keep operations fast. Think of it like moving to a bigger house when your family outgrows the old one. Once the load factor crosses a threshold (typically 0.75), the hashmap allocates a larger array and redistributes its elements. It walks every key-value pair, recomputes its hash, and places it at a new index. The operation is expensive when it runs, but it's what keeps the hashmap fast as it grows:
def resize(hashmap):
# Double the size of the current hashmap
size = len(hashmap) * 2
# Use the doubled size to create a new hashmap and return it
new_hashmap = [[] for _ in range(size)]
return new_hashmap
def rehash(hashmap):
# Use the "resize" function to generate a larger empty hashmap
new_hashmap = resize(hashmap)
for bucket in hashmap:
for key, value in bucket:
# Calculate the new index for each key-value pair
new_index = hash(key) % len(new_hashmap)
# Add the key-value pair to the corresponding bucket in the new hashmap
new_hashmap[new_index].append([key, value])
return new_hashmapSeparate chaining is a simple way to handle collisions. When multiple keys land at the same index, chaining stores them together in a linked listat that spot. Each index holds a "chain" (really just a linked list) of all the entries that hashed to it. Multiple items can safely share an index this way.
from typing import Optional
# Definition of a singly-linked ListNode
class ListNode:
def __init__(self, key: int = -1, val: int = -1, next: Optional['ListNode'] = None):
self.key = key
self.val = val
self.next = next
class MyHashMap:
def __init__(self):
# Our hashmap is just an array containing "size" number of ListNodes
self.size = 1000
self.hashmap = [ListNode() for i in range(self.size)]
def hash_function(self, key: int) -> int:
return key % self.size
def put(self, key: int, value: int) -> None:
# Hash the key and find the corresponding ListNode
index = self.hash_function(key)
cur = self.hashmap[index]
while cur.next:
# If we find the key, then rewrite the value
if cur.next.key == key:
cur.next.val = value
return
cur = cur.next
# If we don't find the key, then add a new ListNode
cur.next = ListNode(key, value)
def get(self, key: int) -> int:
# Hash the key and find the corresponding ListNode
index = self.hash_function(key)
cur = self.hashmap[index]
while cur.next:
# If we find the key, then return the value
if cur.next.key == key:
return cur.next.val
cur = cur.next
# If we don't find the key, then return -1
return -1
def remove(self, key: int) -> None:
# Hash the key and find the corresponding ListNode
index = self.hash_function(key)
cur = self.hashmap[index]
while cur.next:
# If we find the key, then remove that ListNode
if cur.next.key == key:
cur.next = cur.next.next
return
cur = cur.nextEach operation on a chained hashmap follows the same shape:
put): Hash the key with hash_function to get the index, then add the new key-value pair to the end of the linked list at that index.get): Hash the key with hash_function to get the index, then walk the linked list at that index to find the matching key.remove): Same as retrieval, but remove the node once you find it.Open addressing is another way to handle collisions. Unlike chaining, which uses linked lists at each index, open addressing stores only one element per index. When a collision happens, the hashmap looks for the next available slot and puts the new key-value pair there. Overlapping elements get a new home somewhere else in the table:
class MyHashMap:
def __init__(self):
# Initialize the hashmap with a fixed size
self.size = 80_000
self.hashmap = [None] * self.size
def hash_function(self, key: int) -> int:
# Simple hash function to determine the index
return key % self.size
def probe(self, index: int) -> int:
# Linear probing: move to the next index, wrapping around if necessary
return (index + 1) % self.size
def put(self, key: int, value: int) -> None:
# Hash the key to find the initial index
index = self.hash_function(key)
while self.hashmap[index] is not None:
if self.hashmap[index][0] == key:
# If we find the key, update the value
self.hashmap[index] = (key, value)
return
# Collision occurred, try the next slot
index = self.probe(index)
# Empty slot found, insert the new key-value pair
self.hashmap[index] = (key, value)
def get(self, key: int) -> int:
# Hash the key to find the initial index
index = self.hash_function(key)
while self.hashmap[index] is not None:
if self.hashmap[index][0] == key:
# If we find the key, return the value or try the next slot
return self.hashmap[index][1]
index = self.probe(index)
# Key not found after probing all slots
return -1
def remove(self, key: int) -> None:
# Hash the key to find the initial index
index = self.hash_function(key)
while self.hashmap[index] is not None:
if self.hashmap[index][0] == key:
# If we find the key, mark the slot as deleted
self.hashmap[index] = None
return
# If we didn't find the key, try the next slot
index = self.probe(index)Open addressing has a few different strategies for finding the next slot:
Sets, also called hash sets, are data structures that use hashing to store unique elements. While hashmaps store key-value pairs, sets only store the values and reject duplicates. Insertion, deletion, and lookup are all fast, which makes sets the go-to when you need to track what you've already seen or check if something exists.
| Operations | Time | Space |
|---|---|---|
| Add | O(1)* | O(1) |
| Remove | O(1)* | O(1) |
| Search | O(1)* | O(1) |
| Size | O(1) | O(1) |
| Iterate | O(n) | O(1) |
| Clear | O(1) | O(1) |
| Resize | O(n) | O(n) |
| Union | O(n+m) | O(n+m) |
| Intersection | O(min(n,m)) | O(min(n,m)) |
| Difference | O(n) | O(n) |
Sets work on the same principles as hashmaps, so we can build one using chaining with linked lists. The ListNode class holds a unique element and a pointer to the next node, which lets us form chains of distinct values at each index. The HashSet class starts with a fixed number of buckets, and each bucket begins with a dummy ListNode:
# Definition of a singly-linked ListNode
class ListNode:
def __init__(self, key: int = -1, next: Optional['ListNode'] = None):
self.key = key
self.next = next
class MyHashSet:
def __init__(self):
# Our hash set is just an array containing "size" number of ListNodes
self.size = 1000
self.hashset = [ListNode() for _ in range(self.size)]
def hash_function(self, key: int) -> int:
return key % self.size
def add(self, key: int) -> None:
# Hash the key and find the corresponding ListNode
index = self.hash_function(key)
cur = self.hashset[index]
while cur.next:
# If we find the key, then it's already in the set
if cur.next.key == key:
return
cur = cur.next
# If we don't find the key, then add a new ListNode
cur.next = ListNode(key)
def contains(self, key: int) -> bool:
# Hash the key and find the corresponding ListNode
index = self.hash_function(key)
cur = self.hashset[index]
while cur.next:
# If we find the key, then it's in the set
if cur.next.key == key:
return True
cur = cur.next
# If we don't find the key, then it's not in the set
return False
def remove(self, key: int) -> None:
# Hash the key and find the corresponding ListNode
index = self.hash_function(key)
cur = self.hashset[index]
while cur.next:
# If we find the key, then remove that ListNode
if cur.next.key == key:
cur.next = cur.next.next
return
cur = cur.nextHashmaps and sets are the workhorses of optimal solutions. If you can spot when to reach for one, you can usually find a fast solution to a problem. Here are some tips for using them well:
A technique using two pointers to traverse data structures efficiently.
A technique for processing contiguous subarrays or substrings efficiently.
A technique for preprocessing arrays to answer range sum queries in constant time.
Algorithms for arranging elements in a specific order for efficient processing.
Hashmaps trade space for time, and once you start spotting that tradeoff, you see it everywhere. Work through these classic problems to internalize when a hashmap or set turns an O(n²) brute force into a clean linear scan:
Completed | Title | Status | Notes | Solution | |
|---|---|---|---|---|---|
Easy | |||||
Easy | |||||
Easy | |||||
Easy | |||||
Medium | |||||
Medium | |||||
Medium | |||||
Medium | |||||
Medium | |||||
Medium | |||||
Hard | |||||
Hard |