Union-Find is a tool for managing groups where each element belongs to exactly one group. Also known as Disjoint Set Union (DSU), it helps us track disjoint sets and efficiently answer two key questions: which group does an element belong to, and are two elements in the same group. Union-Find is perfect for finding connected components in a graph where we need to quickly determine if elements are connected.
Union-Find revolves around two main operations that work together to manage groups. The find
operation tells us which group an element belongs to, while the union
operation combines two separate groups into one. These operations are dynamic, meaning we can perform them at any time as our groups evolve and change.
Operations | Time | Space |
---|---|---|
Make Set | O(1) | O(1) |
Find | O(α(n)) | O(1) |
Union | O(α(n)) | O(1) |
Connected | O(α(n)) | O(1) |
Get Size | O(1) | O(1) |
Get Count | O(1) | O(1) |
Initialize | O(n) | O(n) |
Get Components | O(n) | O(n) |
Some Union-Find operations use a special mathematical concept called the Inverse Ackermann function, shown as α(n)
. Don't worry too much about the math behind it - what's important is that this function grows so slowly that when we say an algorithm runs in O(α(n))
time, it means it's practically as fast as an instant operation.
A disjoint set is like separating items into distinct groups where each item must belong to exactly one group. Think of it as organizing students into different teams - each student can only be on one team at a time, with no overlap between teams. This is perfect for tracking groups that need to stay separate, but might need to be combined later.
To make these groups efficient to manage, Union-Find uses two key optimizations: path compression and union by size. Path compression flattens our groups to make lookups faster, while union by size keeps our groups balanced when we combine them. Together, these techniques make operations practically instant.
To build Union-Find, we need two arrays: one to track each element's parent and another to track each group's size. Initially, every element is its own parent (forming its own group), so we set each index in the par
array to point to itself. The size
array starts with all 1
's since each element begins in a group of size 1
.
class UnionFind:
def __init__(self, size: int):
# Create arrays to store node's parents and their sizes
self.par = [node for node in range(size + 1)]
self.size = [1] * len(self.par)
This setup creates the foundation for our Union-Find operations. The par
array lets us trace paths to find group leaders, while the size
array helps us keep our trees balanced when merging groups. As we perform operations, these arrays will change to reflect how our elements are grouped together.
Path compression is an optimization that makes Union-Find operations lightning fast. When we look up an element's group using find
, we often need to follow a chain of parent pointers to reach the root. Without path compression, these chains can get long, making lookups slow. Let's first look at a basic find
operation that runs in O(n)
time because it might need to traverse the entire tree:
# Using a simple union-find setup instead of defining a class
def setup(size: int):
par = [node for node in range(size + 1)]
size = [1] * len(par)
# Find function WITHOUT path compression
def find(node: int) -> int:
while par[node] != node:
node = par[node]
return node
Path compression solves this problem by restructuring the tree during lookups. As we traverse up the tree to find the root, we make each node we visit point directly to the root. This flattening process means that future lookups for any of these nodes will be much faster since they're now just one step away from their root. Here's how we implement path compression:
# Using a simple union-find setup instead of defining a class
def setup(size: int):
par = [node for node in range(size + 1)]
size = [1] * len(par)
# Find function WITH path compression
def find(node: int) -> int:
if par[node] != node:
par[node] = find(par[node])
return par[node]
By adding path compression, we dramatically improve the performance of Union-Find. What used to take O(n)
operations now runs in practically constant time, thanks to the O(α(n))
inverse Ackermann function. The best part is that this huge speedup comes from adding just a single line of code to our find
operation.
Path halving is a lesser alternative to path compression that still gives us great performance. Instead of connecting every node directly to the root, path halving makes each node skip its grandparent as we move up the tree. This means that while we're looking for the root, we're also shortening the path for future operations.
# Using a simple union-find setup instead of defining a class
def setup(size: int):
par = [node for node in range(size + 1)]
size = [1] * len(par)
# Find function WITH path halving
def find(node: int) -> int:
while par[node] != node:
# Point the node to its grandparent
par[node] = par[par[node]]
node = par[node]
return node
While path halving achieves the same O(α(n))
time complexity, it's not as efficient as path compression. Since nodes aren't directly connected to the root, future operations still need multiple steps to find their group. Path compression is almost always the better choice because the extra memory writes are worth the performance gain of having every node just one step away from its root.
Union by size is an optimization that helps keep our tree balanced when combining groups. When we merge two groups, we want to avoid creating tall, unbalanced trees that would make lookups slow. Instead, we always merge the smaller group into the larger one, just like how it's easier to pour a small glass of water into a larger one.
# Using a simple union-find setup instead of defining a class
def setup(size: int):
par = [node for node in range(size + 1)]
size = [1] * len(par)
def union(node1: int, node2: int) -> bool:
# Find the root of the given nodes
p1, p2 = find(node1), find(node2)
# Base case: If the roots are same, nodes are already connected
if p1 == p2:
return False
# Union by Size
if size[p1] > size[p2]:
par[p2] = p1
size[p1] += size[p2]
else:
par[p1] = p2
size[p2] += size[p1]
return True
To make this work, we need to keep track of each group's size. Every element starts in its own group with a size of one. When we merge groups, we attach the smaller tree to the larger one and add their sizes together. This simple rule ensures our trees never get too tall, making all our operations faster.
Union by rank is similar to union by size but focuses on tracking tree heights instead of group sizes. When combining groups, we want to minimize the height of our resulting tree to keep operations fast. Like union by size, we merge the shorter tree under the taller one, which helps prevent our trees from becoming too deep.
# Using a simple union-find setup instead of defining a class
def setup(size: int):
par = [node for node in range(size + 1)]
size = [1] * len(par)
def union(node1: int, node2: int) -> bool:
# Find the root of the given nodes
p1, p2 = find(node1), find(node2)
# Base case: If the roots are same, nodes are already connected
if p1 == p2:
return False
# Union by Rank
if rank[p1] > rank[p2]:
par[p2] = p1
elif rank[p1] < rank[p2]:
par[p1] = p2
else:
par[p2] = p1
rank[p1] += 1
return True
Instead of counting elements, we maintain a rank
array that tracks each tree's height. When we merge trees of different ranks, the shorter tree joins under the taller one without changing any ranks. Only when we combine trees of equal rank does the resulting tree's rank increase by one. This approach still achieves the same efficiency as union by size, but some find it more intuitive.
Union-Find is a great tool for solving problems related to disjoint sets, connectivity, and clustering in coding interviews. To use Union-Find efficiently, here are some great tips and tricks:
find
and union
operations. A solid understanding will enable you to apply it to a wide range of problems.