A graph traversal can tell you whether two vertices are connected, at a cost of O(V + E) per question. Now imagine the connections keep arriving, friendships forming one at a time, and after every new link someone asks: are these two in the same group yet? Re-running a traversal for every question is brutal. Union-Find answers each one in practically constant time.
Union-Find is a data structure for managing groups where each element belongs to exactly one group. Also known as Disjoint Set Union (DSU), it tracks disjoint sets and answers two questions quickly: which group does an element belong to, and are two elements in the same group? Union-Find is the go-to choice for finding connected components in a graph where you need to quickly check whether two elements are connected.
Union-Find has two main operations. find tells us which group an element belongs to, and union combines two separate groups into one. Watch it work: five elements, numbered 0 through 4, start in five separate groups. union(0, 1) merges two of them, union(2, 3) merges two more, and now find(0) and find(1) return the same leader while find(4) still stands alone. Both operations work dynamically, so you can call them at any time as your groups grow and change.
| Operations | Time | Space |
|---|---|---|
| Make Set | O(1) | O(1) |
| Find | O(a(n)) | O(1) |
| Union | O(a(n)) | O(1) |
| Connected | O(a(n)) | O(1) |
| Get Size | O(1) | O(1) |
| Get Count | O(1) | O(1) |
| Initialize | O(n) | O(n) |
| Get Components | O(n) | O(n) |
Those O(α(n)) entries in the table come from the Inverse Ackermann function. Skip the math and keep the takeaway: treat every α(n) operation as constant time, because no real input will ever make it slower.
A disjoint set is a way of organizing items into distinct groups where each item must belong to exactly one group. Think of organizing students into different teams: each student can only be on one team at a time, with no overlap. The analogy misses the strangest rule, though: there are no transfers. Union-Find can merge teams but never split them, and that one-way street is part of what makes it so fast. This is the right model for tracking groups that need to stay separate but might be combined later.
To keep these groups efficient, Union-Find uses two main optimizations: path compression and union by size. Path compression flattens the tree to make lookups faster, while union by size keeps the tree balanced when groups are merged. Together, they're what earn the α(n) in the table above.
To build Union-Find, we need two arrays: one to track each element's parent and another to track each group's size. At the start, every element is its own parent (so every element is its own group). We set each index in the par array to point to itself, and every entry in the size array starts at 1 because each element begins in a group of size 1.
class UnionFind:
def __init__(self, n: int):
# Each node starts as its own parent
self.par = [node for node in range(n)]
# Each group starts with size 1
self.size = [1] * nThat setup is the foundation for every Union-Find operation. The par array lets us trace paths back to the group leader, and the size array helps us keep the trees balanced when we merge groups. As we run operations, these arrays change to reflect how the elements are currently grouped.
Path compression is the optimization that makes Union-Find operations fast in the practical sense. When we look up an element's group with find, we follow a chain of parent pointers up to the root. Without compression, these chains can get long and find slows down. The worst case is a straight chain, the same degenerate spine that ruined the unbalanced BST. Here's a basic find that runs in O(n) time because it might walk the entire tree:
# Basic find without path compression - O(n)
def find(self, node: int) -> int:
while self.par[node] != node:
node = self.par[node]
return nodePath compression fixes this by restructuring the tree during lookups. As we walk up to find the root, we make every node we visit point directly to the root. This flattening means future lookups for any of these nodes are much faster, since they're now one step away from the root. Here's the implementation:
# Find with path compression - O(a(n))
def find(self, node: int) -> int:
if self.par[node] != node:
# Make every node point directly to root
self.par[node] = self.find(self.par[node])
return self.par[node]Adding path compression dramatically improves Union-Find's performance. What used to take O(n) per find now runs in practically constant time, thanks to the O(α(n)) inverse Ackermann bound. And the speedup comes from one extra line of code in find.
Path halving is a lesser-known alternative to path compression that still gives you great performance. Instead of connecting every node directly to the root, path halving makes each node point to its grandparent as we move up the tree. While we're looking for the root, we're also shortening the path for future operations.
# Find with path halving - O(a(n))
def find(self, node: int) -> int:
while self.par[node] != node:
# Skip to grandparent
self.par[node] = self.par[self.par[node]]
node = self.par[node]
return nodePath halving hits the same O(α(n)) time complexity, but in practice it's not as efficient as path compression. Because nodes don't end up connected directly to the root, future operations still take multiple steps to find the group. Path compression is almost always the better choice. The extra memory writes are worth the speedup of having every node sit one step away from its root.
Union by size keeps the tree balanced when we combine groups. When merging two groups, we want to avoid creating tall, lopsided trees that slow down find. The fix is simple: always merge the smaller group into the larger one. It's like pouring a small glass of water into a larger glass instead of the other way around. Unlike water, though, the smaller group keeps its shape: it just hangs its root under the bigger one.
# Union by size - O(a(n))
def union(self, node1: int, node2: int) -> bool:
p1, p2 = self.find(node1), self.find(node2)
# Already in the same group
if p1 == p2:
return False
# Merge smaller group into larger group
if self.size[p1] > self.size[p2]:
self.par[p2] = p1
self.size[p1] += self.size[p2]
else:
self.par[p1] = p2
self.size[p2] += self.size[p1]
return TrueTo pull this off, we need to track each group's size. Every element starts in its own group with a size of 1. When we merge groups, we attach the smaller tree to the larger one and add their sizes together. This one rule stops the trees from ever getting too tall, which keeps every operation fast.
Union by rank is similar to union by size, but it focuses on tracking tree heights instead of group sizes. When combining groups, we want to keep the resulting tree as short as possible to keep operations fast. Like union by size, we merge the shorter tree under the taller one, which prevents the tree from getting too deep.
# Union by rank - O(a(n))
def union(self, node1: int, node2: int) -> bool:
p1, p2 = self.find(node1), self.find(node2)
# Already in the same group
if p1 == p2:
return False
# Merge shorter tree under taller tree
if self.rank[p1] > self.rank[p2]:
self.par[p2] = p1
elif self.rank[p1] < self.rank[p2]:
self.par[p1] = p2
else:
self.par[p1] = p2
self.rank[p2] += 1
return TrueInstead of counting elements, we keep a rank array that tracks each tree's height. When merging trees of different ranks, the shorter tree joins under the taller one without changing any ranks. Only when we merge two trees of equal rank does the resulting tree's rank go up by one. This gives you the same efficiency as union by size, just with a different mental model that some find more intuitive.
Union-Find is a great fit for problems about disjoint sets, connectivity, and clustering. Here are a few tips for using it well in coding interviews:
find (look up an element's group) and union (merge two groups). Once that intuition is there, the wide range of problems Union-Find solves becomes a lot more obvious.An algorithm for traversing graphs by exploring as far as possible along each branch.
An algorithm for traversing graphs level by level.
An algorithm for finding the minimum spanning tree of a graph.
An algorithm for finding the minimum spanning tree using edge sorting.
First, two checks from this page: what does path compression do to the tree during a find? And why merge the smaller group into the larger one, never the reverse? If both are easy, you're ready. Union-Find feels almost too simple until you realize how many "connected component" problems collapse into two operations. Many of these problems can also be solved with DFS or BFS, but the union-find solution is usually shorter and easier to reason about. Work through them to spot when path compression and union by rank are the right tools:
Completed | Title | Status | Notes | Solution | |
|---|---|---|---|---|---|
Easy | |||||
Medium | |||||
Medium | |||||
Medium | |||||
Medium | |||||
Medium | |||||
Medium | |||||
Medium | |||||
Medium | |||||
Medium | |||||
Hard | |||||
Hard |