Data Structures Hashmaps

Hashmaps

Key-value pairs for efficient data lookup.

Definition

A hashmap is essentially a smart filing system built on an array. It stores all its data as key-value pairs, where each key serves as a unique address for its corresponding value. To organize this data efficiently, a hashmap uses a hash function which maps each pair to a specific location in the array. This setup allows you to quickly retrieve any value using its key, without searching the entire dataset.

Operations

Hashmaps offer quick data operations. The hash function decides where to store each item, making access super fast. Keep in mind, hashmaps are unordered, so they don't work well for sorted data. Sometimes, collisions happen when two keys end up in the same spot, but good hash functions help avoid this problem.

OperationsTimeSpace
SearchO(1)O(1)
AccessO(1)O(1)
InsertO(1)O(1)
DeleteO(1)O(1)
Key-Value Pairs

Key-value pairs are the building blocks of hashmaps. A key is a unique label that points to some value (data). Think of a contact list: a person's name (the key) leads to their phone number (the value). Remember, keys are always unique, but values can repeat. The hash function uses keys to quickly store and find values in the hashmap.

Hash Function

Hashmaps rely on hash functions. These special functions convert any input into an integer. This integer, called a hash code, serves as an index to store the input value in the hashmap. By using hash codes, hashmaps can quickly insert and retrieve data. They directly access the storage location without searching through all the data, making operations incredibly fast.

Hash Function

# Hash Function: Converts key (string) into an index
def hash_function(key: str, size: int) -> int:
    hash_value = 0
    for index, char in enumerate(key):
        # Use position-dependent weighting
        hash_value += (index + 1) * ord(char)
    
    # Apply a prime multiplier for better distribution
    hash_value *= 2069
    
    # Ensure index is within the size range
    return hash_value % size
Animation of hash functions and how they produce an index for key-value pairs to be stored at

The hash function shown above demonstrates how to convert a key (like an integer or string) into an index for storing or finding data in a hashmap. It's important to apply the modulo operator against the hashmap's size to ensure the index falls within bounds. A well-designed hash function spreads keys evenly across the hashmap, minimizing collisions. Even distribution is the key to maintaining a hashmap's efficiency for quick operations.

Hashmap Collisions

Collisions occur when different keys hash to the same index. Imagine two students mistakenly assigned the same seat in a classroom - that's a collision in a hashmap. To handle these situations without losing data, hashmaps use strategies like chaining or open addressing. These methods ensure all information remains accessible, even when multiple keys point to the same location in the hashmap.

Resize and Rehash

As a hashmap grows, it may need to resize and rehash to maintain efficiency. This process is like moving to a bigger house when your family expands. When the load factor reaches a certain threshold (typically 0.75), the hashmap increases its size and redistributes its elements. This involves creating a larger array, then recalculating the hash for each key-value pair and placing them in their new positions. While this operation can be time-consuming, it's important for maintaining good performance as the hashmap grows:

Resize and Rehash

def resize(hashmap):
    # Double the size of the current hashmap
    size = len(hashmap) * 2 
    # Use the doubled size to create a new hashmap and return it
    new_hashmap = [[] for _ in range(size)]
    return new_hashmap

def rehash(hashmap):
    # Use the "resize" function to generate a larger empty hashmap
    new_hashmap = resize(hashmap)
    for bucket in hashmap:
        for key, value in bucket:
            # Calculate the new index for each key-value pair
            new_index = hash(key) % len(new_hashmap)
            # Add the key-value pair to the corresponding bucket in the new hashmap
            new_hashmap[new_index].append([key, value])
    return new_hashmap
Separate Chaining

Separate chaining is a simple way to handle collisions in hashmaps. When multiple keys end up at the same index, chaining links the data together. In this method, each index holds a "chain" (which is actually a linked list). This chain stores all entries that belong to that index, allowing multiple items to share the same location and resolving collisions.

Chained Hashmap

from typing import Optional

# Definition of a singly-linked ListNode
class ListNode:
    def __init__(self, key: int = -1, val: int = -1, next: Optional['ListNode'] = None):
        self.key = key
        self.val = val
        self.next = next

class MyHashMap:
    def __init__(self):
        # Our hashmap is just an array containing "size" number of ListNodes
        self.size = 1000
        self.hashmap = [ListNode() for i in range(self.size)]

    def hash_function(self, key: int) -> int:
        return key % self.size

    def put(self, key: int, value: int) -> None:
        # Hash the key and find the corresponding ListNode
        index = self.hash_function(key)
        cur = self.hashmap[index]
        while cur.next:
            # If we find the key, then rewrite the value
            if cur.next.key == key:
                cur.next.val = value
                return
            cur = cur.next
        # If we don't find the key, then add a new ListNode
        cur.next = ListNode(key, value)

    def get(self, key: int) -> int:
        # Hash the key and find the corresponding ListNode
        index = self.hash_function(key)
        cur = self.hashmap[index]
        while cur.next:
            # If we find the key, then return the value
            if cur.next.key == key:
                return cur.next.val
            cur = cur.next
        # If we don't find the key, then return -1
        return -1

    def remove(self, key: int) -> None:
        # Hash the key and find the corresponding ListNode
        index = self.hash_function(key)
        cur = self.hashmap[index]
        while cur.next:
            # If we find the key, then remove that ListNode
            if cur.next.key == key:
                cur.next = cur.next.next
                return
            cur = cur.next

When working with a chained hashmap, we follow these steps for different operations:

  1. Insertion (put): calculate the index using the hash_function. Then, add the new key-value pair to the end of the linked list at that index
  2. Retrieval (get): Compute the index using the hash_function, then search through the linked list at that index to find the desired key
  3. Deletion (remove): Follow the same process as retrieval, but remove the node from the linked list once found
Open Addressing

Open addressing is a method for handling collisions in hashmaps. Unlike chaining, which uses linked lists at each index, open addressing keeps only one element per index. When a collision occurs, the hashmap searches for the next available slot to place the new key-value pair. This approach quickly finds alternative spots within the hashmap for overlapping elements:

Open Addressing

class MyHashMap:
    def __init__(self):
        # Initialize the hashmap with a fixed size
        self.size = 80_000
        self.hashmap = [None] * self.size

    def hash_function(self, key: int) -> int:
        # Simple hash function to determine the index
        return key % self.size

    def probe(self, index: int) -> int:
        # Linear probing: move to the next index, wrapping around if necessary
        return (index + 1) % self.size

    def put(self, key: int, value: int) -> None:
        # Hash the key to find the initial index
        index = self.hash_function(key)
        while self.hashmap[index] is not None:
            if self.hashmap[index][0] == key:
                # If we find the key, update the value
                self.hashmap[index] = (key, value)
                return
            # Collision occurred, try the next slot
            index = self.probe(index)
        # Empty slot found, insert the new key-value pair
        self.hashmap[index] = (key, value)

    def get(self, key: int) -> int:
        # Hash the key to find the initial index
        index = self.hash_function(key)
        while self.hashmap[index] is not None:
            if self.hashmap[index][0] == key:
                # If we find the key, return the value or try the next slot
                return self.hashmap[index][1]
            index = self.probe(index)
        # Key not found after probing all slots
        return -1

    def remove(self, key: int) -> None:
        # Hash the key to find the initial index
        index = self.hash_function(key)
        while self.hashmap[index] is not None:
            if self.hashmap[index][0] == key:
                # If we find the key, mark the slot as deleted 
                self.hashmap[index] = None
                return
            # If we didn't find the key, try the next slot
            index = self.probe(index)

Open addressing uses various methods to find available indices when a collision occurs:

  1. Linear probing: The table is searched sequentially for the next available slot.
  2. Quadratic probing: A quadratic function is used to determine the next potential slot
  3. Double hashing: A second hash function is employed to calculate the step size for the search
Sets

Sets, also known as hash sets, are data structures that use hashing to store unique elements. Unlike hash maps that store key-value pairs, sets only store individual values and prevent duplicates. With their fast insertion, deletion, and search operations, sets are great for scenarios where uniqueness or quick data access matters.

OperationsTimeSpace
AddO(1)O(1)
RemoveO(1)O(1)
SearchO(1)O(1)
SizeO(1)O(1)
IterateO(n)O(1)
ClearO(n)O(1)

Sets can be implemented using the same principles as hashmaps. We can create an efficient set structure using chaining with linked lists. The core of this implementation is the ListNode class, which stores a unique element and a pointer to the next node. This allows us to form chains of distinct elements at each index of the hashset. The HashSet class is initialized with a specific number of buckets. Each bucket starts with a dummy ListNode, simplifying insertion and deletion operations:

Hash Set

# Definition of a singly-linked ListNode
class ListNode:
    def __init__(self, key: int = -1, next: Optional['ListNode'] = None):
        self.key = key
        self.next = next

class MyHashSet:
    def __init__(self):
        # Our hash set is just an array containing "size" number of ListNodes
        self.size = 1000
        self.hashset = [ListNode() for _ in range(self.size)]

    def hash_function(self, key: int) -> int:
        return key % self.size

    def add(self, key: int) -> None:
        # Hash the key and find the corresponding ListNode
        index = self.hash_function(key)
        cur = self.hashset[index]
        while cur.next:
            # If we find the key, then it's already in the set
            if cur.next.key == key:
                return
            cur = cur.next
        # If we don't find the key, then add a new ListNode
        cur.next = ListNode(key)

    def contains(self, key: int) -> bool:
        # Hash the key and find the corresponding ListNode
        index = self.hash_function(key)
        cur = self.hashset[index]
        while cur.next:
            # If we find the key, then it's in the set
            if cur.next.key == key:
                return True
            cur = cur.next
        # If we don't find the key, then it's not in the set
        return False

    def remove(self, key: int) -> None:
        # Hash the key and find the corresponding ListNode
        index = self.hash_function(key)
        cur = self.hashset[index]
        while cur.next:
            # If we find the key, then remove that ListNode
            if cur.next.key == key:
                cur.next = cur.next.next
                return
            cur = cur.next
Best Practices for Hashmaps

Hashmaps and sets are the most important data structures to learn if you ever want to see an optimal solution. Here are some best practices for using hashmaps and sets effectively for problems:


Sets are the go-to for operations requiring uniqueness among elements. They automatically handle duplicates and provide constant time lookups, making them perfect for problems that involve duplicates.

Copyright © StudyDSA. All rights reserved.