Union-Find is a data structure for managing groups where each element belongs to exactly one group. Also known as Disjoint Set Union (DSU), it tracks disjoint sets and answers two questions quickly: which group does an element belong to, and are two elements in the same group? Union-Find is the go-to choice for finding connected components in a graph where you need to quickly check whether two elements are connected.
Union-Find has two main operations. find tells us which group an element belongs to, and union combines two separate groups into one. Both work dynamically, so you can call them at any time as your groups grow and change.
| Operations | Time | Space |
|---|---|---|
| Make Set | O(1) | O(1) |
| Find | O(a(n)) | O(1) |
| Union | O(a(n)) | O(1) |
| Connected | O(a(n)) | O(1) |
| Get Size | O(1) | O(1) |
| Get Count | O(1) | O(1) |
| Initialize | O(n) | O(n) |
| Get Components | O(n) | O(n) |
Some Union-Find operations use a mathematical concept called the Inverse Ackermann function, written as α(n). The math itself isn't important here. What matters is that the function grows so slowly that an O(α(n)) algorithm is practically as fast as an instant operationfor any input you'll encounter.
A disjoint set is a way of organizing items into distinct groups where each item must belong to exactly one group. Think of organizing students into different teams: each student can only be on one team at a time, with no overlap. This is the right model for tracking groups that need to stay separate but might be combined later.
To keep these groups efficient, Union-Find uses two main optimizations: path compression and union by size. Path compression flattens the tree to make lookups faster, while union by size keeps the tree balanced when groups are merged. Together, they make every operation practically instant.
To build Union-Find, we need two arrays: one to track each element's parent and another to track each group's size. At the start, every element is its own parent (so every element is its own group). We set each index in the par array to point to itself, and the size array starts with all 1s because each element begins in a group of size 1.
class UnionFind:
def __init__(self, n: int):
# Each node starts as its own parent
self.par = [node for node in range(n)]
# Each group starts with size 1
self.size = [1] * nThat setup is the foundation for every Union-Find operation. The par array lets us trace paths back to the group leader, and the size array helps us keep the trees balanced when we merge groups. As we run operations, these arrays change to reflect how the elements are currently grouped.
Path compression is the optimization that makes Union-Find operations fast in the practical sense. When we look up an element's group with find, we follow a chain of parent pointers up to the root. Without compression, these chains can get long and findslows down. Here's a basic find that runs in O(n) time because it might walk the entire tree:
# Basic find without path compression - O(n)
def find(self, node: int) -> int:
while self.par[node] != node:
node = self.par[node]
return nodePath compression fixes this by restructuring the tree during lookups. As we walk up to find the root, we make every node we visit point directly to the root. This flattening means future lookups for any of these nodes are much faster, since they're now one step away from the root. Here's the implementation:
# Find with path compression - O(a(n))
def find(self, node: int) -> int:
if self.par[node] != node:
# Make every node point directly to root
self.par[node] = self.find(self.par[node])
return self.par[node]Adding path compression dramatically improves Union-Find's performance. What used to take O(n) per find now runs in practically constant time, thanks to the O(α(n)) inverse Ackermann bound. And the speedup comes from one extra line of code in find.
Path halving is a lesser-known alternative to path compression that still gives you great performance. Instead of connecting every node directly to the root, path halving makes each node point to its grandparentas we move up the tree. While we're looking for the root, we're also shortening the path for future operations.
# Find with path halving - O(a(n))
def find(self, node: int) -> int:
while self.par[node] != node:
# Skip to grandparent
self.par[node] = self.par[self.par[node]]
node = self.par[node]
return nodePath halving hits the same O(α(n))time complexity, but in practice it's not as efficient as path compression. Because nodes don't end up connected directly to the root, future operations still take multiple steps to find the group. Path compression is almost always the better choice. The extra memory writes are worth the speedup of having every node sit one step away from its root.
Union by size keeps the tree balanced when we combine groups. When merging two groups, we want to avoid creating tall, lopsided trees that slow down find. The fix is simple: always merge the smaller group into the larger one. It's like pouring a small glass of water into a larger glass instead of the other way around.
# Union by size - O(a(n))
def union(self, node1: int, node2: int) -> bool:
p1, p2 = self.find(node1), self.find(node2)
# Already in the same group
if p1 == p2:
return False
# Merge smaller group into larger group
if self.size[p1] > self.size[p2]:
self.par[p2] = p1
self.size[p1] += self.size[p2]
else:
self.par[p1] = p2
self.size[p2] += self.size[p1]
return TrueTo pull this off, we need to track each group's size. Every element starts in its own group with a size of 1. When we merge groups, we attach the smaller tree to the larger one and add their sizes together. This one rule keeps the trees from ever getting too tall, which keeps every operation fast.
Union by rank is similar to union by size, but it focuses on tracking tree heights instead of group sizes. When combining groups, we want to keep the resulting tree as short as possible to keep operations fast. Like union by size, we merge the shorter tree under the taller one, which prevents the tree from getting too deep.
# Union by rank - O(a(n))
def union(self, node1: int, node2: int) -> bool:
p1, p2 = self.find(node1), self.find(node2)
# Already in the same group
if p1 == p2:
return False
# Merge shorter tree under taller tree
if self.rank[p1] > self.rank[p2]:
self.par[p2] = p1
elif self.rank[p1] < self.rank[p2]:
self.par[p1] = p2
else:
self.par[p1] = p2
self.rank[p2] += 1
return TrueInstead of counting elements, we keep a rankarray that tracks each tree's height. When merging trees of different ranks, the shorter tree joins under the taller one without changing any ranks. Only when we merge two trees of equal rank does the resulting tree's rank go up by one. This gives you the same efficiency as union by size, just with a different mental model that some find more intuitive.
Union-Find is a great fit for problems about disjoint sets, connectivity, and clustering. Here are a few tips for using it well in coding interviews:
find (which group does this element belong to?) and union (merge these two groups). Once that intuition is there, the wide range of problems Union-Find solves becomes a lot more obvious.An algorithm for traversing graphs by exploring as far as possible along each branch.
An algorithm for traversing graphs level by level.
An algorithm for finding the minimum spanning tree of a graph.
An algorithm for finding the minimum spanning tree using edge sorting.
Union-Find feels almost too simple — until you realize how many "connected component" problems collapse into two operations. Many of these problems can also be solved with DFS or BFS; the union-find solution is usually shorter and easier to reason about. Work through them to spot when path compression and union by rank are the right tools:
Completed | Title | Status | Notes | Solution | |
|---|---|---|---|---|---|
Easy | |||||
Medium | |||||
Medium | |||||
Medium | |||||
Medium | |||||
Medium | |||||
Medium | |||||
Medium | |||||
Medium | |||||
Medium | |||||
Hard | |||||
Hard |