Arrays gave us instant access, but only when you know the position number. Ask an array whether it contains the value 42 and it checks slot after slot, and a linked list is no better. Finding things by value seemed doomed to cost O(n). So here's the question this page answers: what if the data itself could tell you where it lives?
That's exactly what a hashmap does. It's a smart filing system built on an array, storing data as key-value pairs, where each key is the unique address for its value. A hash function maps each pair to a specific spot in the array, so you can look up any value by its key without searching the entire dataset.
Here's a hashmap's core trick in action. Store the pair cat: meow in a table with 8 slots. The hash function crunches the key cat into the number 5, so the pair goes into slot 5. To look it up later, hash cat again, get 5 again, and jump straight there. There's no scan and no comparison against other keys, and the size of the table never enters the math.
You might conclude that hashmaps are always O(1). Look at the asterisks in the table below: every fast operation says average O(1), worst case O(n). The villain is the collision, two keys landing in the same slot, and much of this page is about what happens when one strikes. There's one more honest cost up front: hashmaps are unordered, so they don't work well when you need sorted data:
| Operations | Time | Space |
|---|---|---|
| Search | O(1)* | O(1) |
| Access | O(1)* | O(1) |
| Insert | O(1)* | O(1) |
| Delete | O(1)* | O(1) |
| Resize | O(n) | O(n) |
| Get Keys | O(n) | O(n) |
| Get Values | O(n) | O(n) |
Key-value pairs are the building blocks of a hashmap. A key is a unique label that points to a value. Think of a contact list: a person's name (the key) maps to their phone number (the value). Keys are always unique, but values can repeat. The analogy leaks in one spot, though: a contact list is alphabetized, while a hashmap keeps no useful order at all. The hash function uses keys to figure out where each value lives, and it scatters them on purpose.
Hashmaps rely on hash functions to do their work. A hash function takes any input and converts it into an integer called a hash code. That integer is used as the index where the value gets stored. Because hashing gives you the index directly, the hashmap can jump straight to the data instead of scanning every entry.
# Hash Function: Converts key (string) into an index
def hash_function(key: str, size: int) -> int:
hash_value = 0
for index, char in enumerate(key):
# Use position-dependent weighting
hash_value += (index + 1) * ord(char)
# Apply a prime multiplier for better distribution
hash_value *= 2069
# Ensure index is within the size range
return hash_value % size
The hash function above shows how to turn a key (a number or a string) into an index for storing or looking up data. Run cat through the Python version with a table size of 8: the weighted character codes sum to 641, the prime multiplier turns that into 1,326,229, and the modulo operator against the hashmap's size reduces it to 1326229 % 8 = 5. That's the slot 5 from earlier, found by pure arithmetic, and the same arithmetic finds it again on every lookup. A good hash function spreads keys evenly across the table, so fewer collisions happen, which is what keeps hashmaps fast. Which raises a question: with 8 slots and an endless supply of possible strings, can a clever enough hash function keep every key in its own slot?
No hash function, however clever, can keep every key in its own slot. Once there are more possible keys than slots, some keys have to share. Collisions happen when different keys hash to the same index. Imagine two students accidentally assigned the same seat in a classroom. That's a collision in a hashmap. To handle them without losing data, hashmaps use strategies like chaining, which adds more chairs at the contested seat, or open addressing, which walks the second student to the next empty seat. Both make sure every value stays reachable, even when multiple keys land in the same slot.
Collisions are also why a hashmap can't stay small forever. Picture a table with 8 buckets absorbing 100 entries: nothing breaks, but the average bucket now holds 12 or 13 items, and every lookup decays into a little scan. So before things get crowded, the hashmap has to resize and rehash. Think of it like moving to a bigger house when your family outgrows the old one, with one twist the analogy misses: nobody keeps their old room. Once the load factor crosses a threshold (typically 0.75), the hashmap allocates a larger array and redistributes its elements. It walks every key-value pair and recomputes its hash, because hash % size changes the moment size does. The operation is expensive when it runs, but it's the same bargain dynamic arrays make: a rare O(n) copy is what keeps the hashmap fast as it grows:
def resize(hashmap):
# Double the size of the current hashmap
size = len(hashmap) * 2
# Use the doubled size to create a new hashmap and return it
new_hashmap = [[] for _ in range(size)]
return new_hashmap
def rehash(hashmap):
# Use the "resize" function to generate a larger empty hashmap
new_hashmap = resize(hashmap)
for bucket in hashmap:
for key, value in bucket:
# Calculate the new index for each key-value pair
new_index = hash(key) % len(new_hashmap)
# Add the key-value pair to the corresponding bucket in the new hashmap
new_hashmap[new_index].append([key, value])
return new_hashmapSeparate chaining is a simple way to handle collisions. When multiple keys land at the same index, chaining stores them together in a linked list at that spot. Each index holds a "chain" (really just a linked list) of all the entries that hashed to it. Multiple items can safely share an index this way. The chains are also where the worst case lives: if every key landed in one bucket, a lookup would mean walking one long list. That's the O(n) the operations table warned you about.
from typing import Optional
# Definition of a singly-linked ListNode
class ListNode:
def __init__(self, key: int = -1, val: int = -1, next: Optional['ListNode'] = None):
self.key = key
self.val = val
self.next = next
class MyHashMap:
def __init__(self):
# Our hashmap is just an array containing "size" number of ListNodes
self.size = 1000
self.hashmap = [ListNode() for i in range(self.size)]
def hash_function(self, key: int) -> int:
return key % self.size
def put(self, key: int, value: int) -> None:
# Hash the key and find the corresponding ListNode
index = self.hash_function(key)
cur = self.hashmap[index]
while cur.next:
# If we find the key, then rewrite the value
if cur.next.key == key:
cur.next.val = value
return
cur = cur.next
# If we don't find the key, then add a new ListNode
cur.next = ListNode(key, value)
def get(self, key: int) -> int:
# Hash the key and find the corresponding ListNode
index = self.hash_function(key)
cur = self.hashmap[index]
while cur.next:
# If we find the key, then return the value
if cur.next.key == key:
return cur.next.val
cur = cur.next
# If we don't find the key, then return -1
return -1
def remove(self, key: int) -> None:
# Hash the key and find the corresponding ListNode
index = self.hash_function(key)
cur = self.hashmap[index]
while cur.next:
# If we find the key, then remove that ListNode
if cur.next.key == key:
cur.next = cur.next.next
return
cur = cur.nextEach operation on a chained hashmap follows the same shape:
put): Hash the key with hash_function to get the index, then add the new key-value pair to the end of the linked list at that index.get): Hash the key with hash_function to get the index, then walk the linked list at that index to find the matching key.remove): Same as retrieval, but remove the node once you find it.Open addressing is the other way to handle collisions, and it allocates nothing new. Chaining works, but every chain node costs extra memory and a pointer hop. Unlike chaining, which uses linked lists at each index, open addressing stores only one element per index. When a collision happens, the hashmap looks for the next available slot and puts the new key-value pair there. Overlapping elements get a new home somewhere else in the table:
class MyHashMap:
def __init__(self):
# Initialize the hashmap with a fixed size
self.size = 80_000
self.hashmap = [None] * self.size
def hash_function(self, key: int) -> int:
# Simple hash function to determine the index
return key % self.size
def probe(self, index: int) -> int:
# Linear probing: move to the next index, wrapping around if necessary
return (index + 1) % self.size
def put(self, key: int, value: int) -> None:
# Hash the key to find the initial index
index = self.hash_function(key)
while self.hashmap[index] is not None:
if self.hashmap[index][0] == key:
# If we find the key, update the value
self.hashmap[index] = (key, value)
return
# Collision occurred, try the next slot
index = self.probe(index)
# Empty slot found, insert the new key-value pair
self.hashmap[index] = (key, value)
def get(self, key: int) -> int:
# Hash the key to find the initial index
index = self.hash_function(key)
while self.hashmap[index] is not None:
if self.hashmap[index][0] == key:
# If we find the key, return the value or try the next slot
return self.hashmap[index][1]
index = self.probe(index)
# Key not found after probing all slots
return -1
def remove(self, key: int) -> None:
# Hash the key to find the initial index
index = self.hash_function(key)
while self.hashmap[index] is not None:
if self.hashmap[index][0] == key:
# If we find the key, mark the slot as deleted
self.hashmap[index] = None
return
# If we didn't find the key, try the next slot
index = self.probe(index)Open addressing has a few different strategies for finding the next slot:
Once you have the hashing machinery, a neat trick falls out: keep the lookup, drop one half of the pair. Sets, also called hash sets, are data structures that use hashing to store unique elements. While hashmaps store key-value pairs, sets only store the values and reject duplicates. Insertion, deletion, and lookup are all fast, which makes sets the go-to when you need to track what you've already seen or check if something exists.
| Operations | Time | Space |
|---|---|---|
| Add | O(1)* | O(1) |
| Remove | O(1)* | O(1) |
| Search | O(1)* | O(1) |
| Size | O(1) | O(1) |
| Iterate | O(n) | O(1) |
| Clear | O(1) | O(1) |
| Resize | O(n) | O(n) |
| Union | O(n+m) | O(n+m) |
| Intersection | O(min(n,m)) | O(min(n,m)) |
| Difference | O(n) | O(n) |
Sets work on the same principles as hashmaps, so we can build one using chaining with linked lists. The ListNode class holds a unique element and a pointer to the next node, which lets us form chains of distinct values at each index. The HashSet class starts with a fixed number of buckets, and each bucket begins with a dummy ListNode:
# Definition of a singly-linked ListNode
class ListNode:
def __init__(self, key: int = -1, next: Optional['ListNode'] = None):
self.key = key
self.next = next
class MyHashSet:
def __init__(self):
# Our hash set is just an array containing "size" number of ListNodes
self.size = 1000
self.hashset = [ListNode() for _ in range(self.size)]
def hash_function(self, key: int) -> int:
return key % self.size
def add(self, key: int) -> None:
# Hash the key and find the corresponding ListNode
index = self.hash_function(key)
cur = self.hashset[index]
while cur.next:
# If we find the key, then it's already in the set
if cur.next.key == key:
return
cur = cur.next
# If we don't find the key, then add a new ListNode
cur.next = ListNode(key)
def contains(self, key: int) -> bool:
# Hash the key and find the corresponding ListNode
index = self.hash_function(key)
cur = self.hashset[index]
while cur.next:
# If we find the key, then it's in the set
if cur.next.key == key:
return True
cur = cur.next
# If we don't find the key, then it's not in the set
return False
def remove(self, key: int) -> None:
# Hash the key and find the corresponding ListNode
index = self.hash_function(key)
cur = self.hashset[index]
while cur.next:
# If we find the key, then remove that ListNode
if cur.next.key == key:
cur.next = cur.next.next
return
cur = cur.nextHashmaps and sets show up in basically every optimal solution. If you can spot when to reach for one, a fast solution usually follows. Here are some tips for using them well:
A technique using two pointers to traverse data structures efficiently.
A technique for processing contiguous subarrays or substrings efficiently.
A technique for preprocessing arrays to answer range sum queries in constant time.
Algorithms for arranging elements in a specific order for efficient processing.
Two questions to check yourself before you start: what has to happen for a hashmap lookup to fall from O(1) to O(n)? And why resize at a 0.75 load factor instead of waiting until the table is completely full? Hashmaps trade space for time, and once you start spotting that tradeoff, you see it everywhere. Work through these classic problems to internalize when a hashmap or set turns an O(n²) brute force into a clean linear scan:
Completed | Title | Status | Notes | Solution | |
|---|---|---|---|---|---|
Easy | |||||
Easy | |||||
Easy | |||||
Easy | |||||
Medium | |||||
Medium | |||||
Medium | |||||
Medium | |||||
Medium | |||||
Medium | |||||
Hard | |||||
Hard |