StudyDSA

Hashmaps

Key-value pairs for efficient data lookup.

Definition

A hashmap is essentially a smart filing system built on an array. It stores all its data as key-value pairs, where each key serves as a unique address for its corresponding value. To organize this data efficiently, a hashmap uses a hash function which maps each pair to a specific location in the array. This setup allows you to quickly retrieve any value using its key, without searching the entire dataset.

Operations

Hashmaps offer quick data operations. The hash function decides where to store each item, making access super fast. Keep in mind, hashmaps are unordered, so they don’t work well for sorted data. Sometimes, collisions happen when two keys end up in the same spot, but good hash functions help avoid this problem.

Operations	Time	Space	Notes
Search	O(1)*	O(1)	Average O(1), worst case O(n) with collisions. Uses hash function to find bucket
Access	O(1)*	O(1)	Average O(1), worst case O(n) with collisions. Retrieves value using key's hash
Insert	O(1)*	O(1)	Average O(1), worst case O(n). May trigger resize if load factor exceeded
Delete	O(1)*	O(1)	Average O(1), worst case O(n). Must handle collision chain if present
Resize	O(n)	O(n)	Triggered when load factor exceeds threshold. Rehashes all elements
Get Keys	O(n)	O(n)	Returns all keys in the hashmap. Must traverse all buckets
Get Values	O(n)	O(n)	Returns all values in the hashmap. Must traverse all buckets

Key-Value Pairs

Key-value pairs are the building blocks of hashmaps. A key is a unique label that points to some value (data). Think of a contact list: a person’s name (the key) leads to their phone number (the value). Remember, keys are always unique, but values can repeat. The hash function uses keys to quickly store and find values in the hashmap.

Hash Function

Hashmaps rely on hash functions. These special functions convert any input into an integer. This integer, called a hash code, serves as an index to store the input value in the hashmap. By using hash codes, hashmaps can quickly insert and retrieve data. They directly access the storage location without searching through all the data, making operations incredibly fast.

Hash Function

# Hash Function: Converts key (string) into an index
def hash_function(key: str, size: int) -> int:
    hash_value = 0
    for index, char in enumerate(key):
        # Use position-dependent weighting
        hash_value += (index + 1) * ord(char)
    
    # Apply a prime multiplier for better distribution
    hash_value *= 2069
    
    # Ensure index is within the size range
    return hash_value % size

The hash function shown above demonstrates how to convert a key (like an integer or string) into an index for storing or finding data in a hashmap. It’s important to apply the modulo operator against the hashmap’s sizeto ensure the index falls within bounds. A well-designed hash function spreads keys evenly across the hashmap, minimizing collisions. Even distribution is the key to maintaining a hashmap’s efficiency for quick operations.

Hashmap Collisions

Collisions occur when different keys hash to the same index. Imagine two students mistakenly assigned the same seat in a classroom - that’s a collision in a hashmap. To handle these situations without losing data, hashmaps use strategies like chaining or open addressing. These methods ensure all information remains accessible, even when multiple keys point to the same location in the hashmap.

Resize and Rehash

As a hashmap grows, it may need to resize and rehash to maintain efficiency. This process is like moving to a bigger house when your family expands. When the load factor reaches a certain threshold (typically 0.75), the hashmap increases its size and redistributes its elements. This involves creating a larger array, then recalculating the hash for each key-value pair and placing them in their new positions. While this operation can be time-consuming, it’s important for maintaining good performance as the hashmap grows:

Resize and Rehash

def resize(hashmap):
    # Double the size of the current hashmap
    size = len(hashmap) * 2 
    # Use the doubled size to create a new hashmap and return it
    new_hashmap = [[] for _ in range(size)]
    return new_hashmap

def rehash(hashmap):
    # Use the "resize" function to generate a larger empty hashmap
    new_hashmap = resize(hashmap)
    for bucket in hashmap:
        for key, value in bucket:
            # Calculate the new index for each key-value pair
            new_index = hash(key) % len(new_hashmap)
            # Add the key-value pair to the corresponding bucket in the new hashmap
            new_hashmap[new_index].append([key, value])
    return new_hashmap

Separate Chaining

Separate chaining is a simple way to handle collisions in hashmaps. When multiple keys end up at the same index, chaining links the data together. In this method, each index holds a "chain" (which is actually a linked list). This chain stores all entries that belong to that index, allowing multiple items to share the same location and resolving collisions.

Chained Hashmap

from typing import Optional

# Definition of a singly-linked ListNode
class ListNode:
    def __init__(self, key: int = -1, val: int = -1, next: Optional['ListNode'] = None):
        self.key = key
        self.val = val
        self.next = next

class MyHashMap:
    def __init__(self):
        # Our hashmap is just an array containing "size" number of ListNodes
        self.size = 1000
        self.hashmap = [ListNode() for i in range(self.size)]

    def hash_function(self, key: int) -> int:
        return key % self.size

    def put(self, key: int, value: int) -> None:
        # Hash the key and find the corresponding ListNode
        index = self.hash_function(key)
        cur = self.hashmap[index]
        while cur.next:
            # If we find the key, then rewrite the value
            if cur.next.key == key:
                cur.next.val = value
                return
            cur = cur.next
        # If we don't find the key, then add a new ListNode
        cur.next = ListNode(key, value)

    def get(self, key: int) -> int:
        # Hash the key and find the corresponding ListNode
        index = self.hash_function(key)
        cur = self.hashmap[index]
        while cur.next:
            # If we find the key, then return the value
            if cur.next.key == key:
                return cur.next.val
            cur = cur.next
        # If we don't find the key, then return -1
        return -1

    def remove(self, key: int) -> None:
        # Hash the key and find the corresponding ListNode
        index = self.hash_function(key)
        cur = self.hashmap[index]
        while cur.next:
            # If we find the key, then remove that ListNode
            if cur.next.key == key:
                cur.next = cur.next.next
                return
            cur = cur.next

When working with a chained hashmap, we follow these steps for different operations:

Insertion: (put): calculate the index using the hash_function. Then, add the new key-value pair to the end of the linked list at that index
Retrieval (get): Compute the index using the hash_function, then search through the linked list at that index to find the desired key
Deletion (remove): Follow the same process as retrieval, but remove the node from the linked list once found

Open Addressing

Open addressing is a method for handling collisions in hashmaps. Unlike chaining, which uses linked lists at each index, open addressing keeps only one element per index. When a collision occurs, the hashmap searches for the next available slot to place the new key-value pair. This approach quickly finds alternative spots within the hashmap for overlapping elements:

Open Addressing

class MyHashMap:
    def __init__(self):
        # Initialize the hashmap with a fixed size
        self.size = 80_000
        self.hashmap = [None] * self.size

    def hash_function(self, key: int) -> int:
        # Simple hash function to determine the index
        return key % self.size

    def probe(self, index: int) -> int:
        # Linear probing: move to the next index, wrapping around if necessary
        return (index + 1) % self.size

    def put(self, key: int, value: int) -> None:
        # Hash the key to find the initial index
        index = self.hash_function(key)
        while self.hashmap[index] is not None:
            if self.hashmap[index][0] == key:
                # If we find the key, update the value
                self.hashmap[index] = (key, value)
                return
            # Collision occurred, try the next slot
            index = self.probe(index)
        # Empty slot found, insert the new key-value pair
        self.hashmap[index] = (key, value)

    def get(self, key: int) -> int:
        # Hash the key to find the initial index
        index = self.hash_function(key)
        while self.hashmap[index] is not None:
            if self.hashmap[index][0] == key:
                # If we find the key, return the value or try the next slot
                return self.hashmap[index][1]
            index = self.probe(index)
        # Key not found after probing all slots
        return -1

    def remove(self, key: int) -> None:
        # Hash the key to find the initial index
        index = self.hash_function(key)
        while self.hashmap[index] is not None:
            if self.hashmap[index][0] == key:
                # If we find the key, mark the slot as deleted 
                self.hashmap[index] = None
                return
            # If we didn't find the key, try the next slot
            index = self.probe(index)

Open addressing uses various methods to find available indices when a collision occurs:

Linear probing: The table is searched sequentially for the next available slot.
Quadratic probing: A quadratic function is used to determine the next potential slot
Double hashing: A second hash function is employed to calculate the step size for the search

Sets

Sets, also known as hash sets, are data structures that use hashing to store unique elements. Unlike hash maps that store key-value pairs, sets only store individual values and prevent duplicates. With their fast insertion, deletion, and search operations, sets are great for scenarios where uniqueness or quick data access matters.

Operations	Time	Space	Notes
Add	O(1)*	O(1)	Average O(1), worst O(n) with collisions. Maintains uniqueness through hashing
Remove	O(1)*	O(1)	Average O(1), worst O(n). Uses hash function to locate element
Search	O(1)*	O(1)	Average O(1), worst O(n). Checks existence using element's hash
Size	O(1)	O(1)	Returns number of elements. Typically tracked during modifications
Iterate	O(n)	O(1)	Must visit each element in the set sequentially
Clear	O(1)	O(1)	Can be implemented by resetting internal storage structure
Resize	O(n)	O(n)	Triggered when load factor threshold is exceeded. Rehashes all elements
Union	O(n+m)	O(n+m)	Combines two sets. n and m are sizes of input sets
Intersection	O(min(n,m))	O(min(n,m))	Finds common elements between sets. Iterates through smaller set
Difference	O(n)	O(n)	Returns elements in first set that aren't in second set

Sets can be implemented using the same principles as hashmaps. We can create an efficient set structure using chaining with linked lists. The core of this implementation is the ListNode class, which stores a unique element and a pointer to the next node. This allows us to form chains of distinct elements at each index of the hashset. The HashSet class is initialized with a specific number of buckets. Each bucket starts with a dummy ListNode:

Hash Set

# Definition of a singly-linked ListNode
class ListNode:
    def __init__(self, key: int = -1, next: Optional['ListNode'] = None):
        self.key = key
        self.next = next

class MyHashSet:
    def __init__(self):
        # Our hash set is just an array containing "size" number of ListNodes
        self.size = 1000
        self.hashset = [ListNode() for _ in range(self.size)]

    def hash_function(self, key: int) -> int:
        return key % self.size

    def add(self, key: int) -> None:
        # Hash the key and find the corresponding ListNode
        index = self.hash_function(key)
        cur = self.hashset[index]
        while cur.next:
            # If we find the key, then it's already in the set
            if cur.next.key == key:
                return
            cur = cur.next
        # If we don't find the key, then add a new ListNode
        cur.next = ListNode(key)

    def contains(self, key: int) -> bool:
        # Hash the key and find the corresponding ListNode
        index = self.hash_function(key)
        cur = self.hashset[index]
        while cur.next:
            # If we find the key, then it's in the set
            if cur.next.key == key:
                return True
            cur = cur.next
        # If we don't find the key, then it's not in the set
        return False

    def remove(self, key: int) -> None:
        # Hash the key and find the corresponding ListNode
        index = self.hash_function(key)
        cur = self.hashset[index]
        while cur.next:
            # If we find the key, then remove that ListNode
            if cur.next.key == key:
                cur.next = cur.next.next
                return
            cur = cur.next

Best Practices

Hashmaps and sets are the most important data structures to learn if you ever want to see an optimal solution. Here are some best practices for using hashmaps and sets effectively for problems:

Sets are the go-to for operations requiring uniqueness among elements. They automatically handle duplicates and provide constant time lookups, making them perfect for problems that involve duplicates.

Algorithms

Two Pointers

Maintain a pair of pointers to carry out logic efficiently, often used in sorted arrays or linked lists.

Sliding Window

Apply a window that slides over data to find subsets meeting a condition, useful for contiguous sequences.

Prefix Sums

Precompute sums of arrays to quickly calculate sum of ranges.

Recursion

Start with base case and take a leap of faith to solve these problems

1D Dynamic Programming

Break problems down into simpler subproblems and storing results to avoid recomputation.

Greedy

Optimize for the best solution step by step, choosing the most beneficial option at each stage without reconsidering.

Intervals

Work with ranges or periods, solving problems such as overlaps, merging intervals, and scheduling.

Bit Operations

Using bitwise operators for efficient problem-solving, ideal for optimizing time and space.

Depth-First Search

Explore as far as possible along branches before backtracking, applicable to trees and graphs.

Breadth-First Search

Explore all neighbors at the present depth prior to moving on to the nodes at the next depth level.

2D Dynamic Programming

Solve problems by breaking them down into simpler subproblems in a two-dimensional space, often used for grid-based problems.

Dijkstra's Algorithm

Find the shortest paths from a single source node to all other nodes in a weighted graph.

A* Search Algorithm

Traverse the graph to find the most efficient path between two nodes, optimizing for shortest distance or cost.

Floyd-Warshall's Algorithm

StudyDSA

StudyDSA

Hashmaps

Key-value pairs for efficient data lookup.

Hash Function

Resize and Rehash

Chained Hashmap

Open Addressing

Hash Set

Leveraging Uniqueness

Key-Value Mapping

Efficient Data Lookup

Space Complexity Considerations

Collision Handling

Two Pointers

Maintain a pair of pointers to carry out logic efficiently, often used in sorted arrays or linked lists.

Sliding Window

Apply a window that slides over data to find subsets meeting a condition, useful for contiguous sequences.

Prefix Sums

Precompute sums of arrays to quickly calculate sum of ranges.

Recursion

Start with base case and take a leap of faith to solve these problems

1D Dynamic Programming

Break problems down into simpler subproblems and storing results to avoid recomputation.

Greedy

Optimize for the best solution step by step, choosing the most beneficial option at each stage without reconsidering.

Intervals

Work with ranges or periods, solving problems such as overlaps, merging intervals, and scheduling.

Bit Operations

Using bitwise operators for efficient problem-solving, ideal for optimizing time and space.

Depth-First Search

Explore as far as possible along branches before backtracking, applicable to trees and graphs.

Breadth-First Search

Explore all neighbors at the present depth prior to moving on to the nodes at the next depth level.

2D Dynamic Programming

Solve problems by breaking them down into simpler subproblems in a two-dimensional space, often used for grid-based problems.

Dijkstra's Algorithm

Find the shortest paths from a single source node to all other nodes in a weighted graph.

A* Search Algorithm

Traverse the graph to find the most efficient path between two nodes, optimizing for shortest distance or cost.

Floyd-Warshall's Algorithm

Find the shortest paths between all pairs of vertices in a weighted graph, with positive and negative edges.

Bellman-Ford Algorithm

Finding the shortest paths from a single source vertex to all other vertices in a weighted graph, even with negative weights.

Prim's Algorithm

Construct a minimum spanning tree for a weighted, undirected graph, ensuring minimal total edge weight.