Data StructuresUnion-Find

Union-Find

A data structure that keeps track of disconnected sets of elements.

Definition

Union-Find is a tool for managing groups where each element belongs to exactly one group. Also known as Disjoint Set Union (DSU), it helps us track disjoint sets and efficiently answer two key questions: which group does an element belong to, and are two elements in the same group. Union-Find is perfect for finding connected components in a graph where we need to quickly determine if elements are connected.

Operations

Union-Find revolves around two main operations that work together to manage groups. The find operation tells us which group an element belongs to, while the union operation combines two separate groups into one. These operations are dynamic, meaning we can perform them at any time as our groups evolve and change.

OperationsTimeSpace
Make SetO(1)O(1)
FindO(α(n))O(1)
UnionO(α(n))O(1)
ConnectedO(α(n))O(1)
Get SizeO(1)O(1)
Get CountO(1)O(1)
InitializeO(n)O(n)
Get ComponentsO(n)O(n)

Some Union-Find operations use a special mathematical concept called the Inverse Ackermann function, shown as α(n). Don't worry too much about the math behind it - what's important is that this function grows so slowly that when we say an algorithm runs in O(α(n)) time, it means it's practically as fast as an instant operation.

Disjoint Sets

A disjoint set is like separating items into distinct groups where each item must belong to exactly one group. Think of it as organizing students into different teams - each student can only be on one team at a time, with no overlap between teams. This is perfect for tracking groups that need to stay separate, but might need to be combined later.


To make these groups efficient to manage, Union-Find uses two key optimizations: path compression and union by size. Path compression flattens our groups to make lookups faster, while union by size keeps our groups balanced when we combine them. Together, these techniques make operations practically instant.

Union-Find Setup

To build Union-Find, we need two arrays: one to track each element's parent and another to track each group's size. Initially, every element is its own parent (forming its own group), so we set each index in the par array to point to itself. The size array starts with all 1's since each element begins in a group of size 1.

Union Find Setup

class UnionFind:
    def __init__(self, size: int):
        # Create arrays to store node's parents and their sizes
        self.par = [node for node in range(size + 1)]
        self.size = [1] * len(self.par)

This setup creates the foundation for our Union-Find operations. The par array lets us trace paths to find group leaders, while the size array helps us keep our trees balanced when merging groups. As we perform operations, these arrays will change to reflect how our elements are grouped together.

Path Compression

Path compression is an optimization that makes Union-Find operations lightning fast. When we look up an element's group using find, we often need to follow a chain of parent pointers to reach the root. Without path compression, these chains can get long, making lookups slow. Let's first look at a basic find operation that runs in O(n) time because it might need to traverse the entire tree:

Basic Find Operation

# Using a simple union-find setup instead of defining a class
def setup(size: int):
    par = [node for node in range(size + 1)]
    size = [1] * len(par)

    # Find function WITHOUT path compression
    def find(node: int) -> int:
        while par[node] != node:
            node = par[node]
        return node

Path compression solves this problem by restructuring the tree during lookups. As we traverse up the tree to find the root, we make each node we visit point directly to the root. This flattening process means that future lookups for any of these nodes will be much faster since they're now just one step away from their root. Here's how we implement path compression:

Path Compression

# Using a simple union-find setup instead of defining a class
def setup(size: int):
    par = [node for node in range(size + 1)]
    size = [1] * len(par)

    # Find function WITH path compression
    def find(node: int) -> int:
        if par[node] != node:
            par[node] = find(par[node])
        return par[node]

By adding path compression, we dramatically improve the performance of Union-Find. What used to take O(n) operations now runs in practically constant time, thanks to the O(α(n)) inverse Ackermann function. The best part is that this huge speedup comes from adding just a single line of code to our find operation.

Path Halving

Path halving is a lesser alternative to path compression that still gives us great performance. Instead of connecting every node directly to the root, path halving makes each node skip its grandparent as we move up the tree. This means that while we're looking for the root, we're also shortening the path for future operations.

Path Halving

# Using a simple union-find setup instead of defining a class
def setup(size: int):
    par = [node for node in range(size + 1)]
    size = [1] * len(par)

    # Find function WITH path halving
    def find(node: int) -> int:
        while par[node] != node:
            # Point the node to its grandparent
            par[node] = par[par[node]]
            node = par[node]
        return node

While path halving achieves the same O(α(n)) time complexity, it's not as efficient as path compression. Since nodes aren't directly connected to the root, future operations still need multiple steps to find their group. Path compression is almost always the better choice because the extra memory writes are worth the performance gain of having every node just one step away from its root.

Union by Size

Union by size is an optimization that helps keep our tree balanced when combining groups. When we merge two groups, we want to avoid creating tall, unbalanced trees that would make lookups slow. Instead, we always merge the smaller group into the larger one, just like how it's easier to pour a small glass of water into a larger one.

Union by Size

# Using a simple union-find setup instead of defining a class
def setup(size: int):
    par = [node for node in range(size + 1)]
    size = [1] * len(par)

    def union(node1: int, node2: int) -> bool:
        # Find the root of the given nodes
        p1, p2 = find(node1), find(node2)
        
        # Base case: If the roots are same, nodes are already connected
        if p1 == p2:
            return False
    
        # Union by Size
        if size[p1] > size[p2]:
            par[p2] = p1
            size[p1] += size[p2]
        else:
            par[p1] = p2
            size[p2] += size[p1]
        
        return True

To make this work, we need to keep track of each group's size. Every element starts in its own group with a size of one. When we merge groups, we attach the smaller tree to the larger one and add their sizes together. This simple rule ensures our trees never get too tall, making all our operations faster.

Union by Rank

Union by rank is similar to union by size but focuses on tracking tree heights instead of group sizes. When combining groups, we want to minimize the height of our resulting tree to keep operations fast. Like union by size, we merge the shorter tree under the taller one, which helps prevent our trees from becoming too deep.

Union by Rank

# Using a simple union-find setup instead of defining a class
def setup(size: int):
    par = [node for node in range(size + 1)]
    size = [1] * len(par)

    def union(node1: int, node2: int) -> bool:
        # Find the root of the given nodes
        p1, p2 = find(node1), find(node2)
        
        # Base case: If the roots are same, nodes are already connected
        if p1 == p2:
            return False
    
        # Union by Rank
        if rank[p1] > rank[p2]:
            par[p2] = p1
        elif rank[p1] < rank[p2]:
            par[p1] = p2
        else:
            par[p2] = p1
            rank[p1] += 1
        
        return True

Instead of counting elements, we maintain a rank array that tracks each tree's height. When we merge trees of different ranks, the shorter tree joins under the taller one without changing any ranks. Only when we combine trees of equal rank does the resulting tree's rank increase by one. This approach still achieves the same efficiency as union by size, but some find it more intuitive.

Best Practices for Union-Find

Union-Find is a great tool for solving problems related to disjoint sets, connectivity, and clustering in coding interviews. To use Union-Find efficiently, here are some great tips and tricks:


Grasp the core concepts of Union-Find, including how it manages disjoint sets, and the significance of its find and union operations. A solid understanding will enable you to apply it to a wide range of problems.

Copyright © StudyDSA. All rights reserved.