Data Structures Union-Find

Union-Find

A data structure that keeps track of disconnected sets of elements.

Definition

Union-Find, also known as Disjoint Set Union (DSU), manages a collection of disjoint sets, allowing for efficient queries and updates about whether elements are in the same set or not.

Operations

Union-Find supports two fundamental operations: find, which identifies the set an element belongs to, and union, which merges two sets into one. These operations are dynamic, meaning they can be performed at any time after the graph's initial construction:

OperationsTimeSpace
FindO(α(N))O(N)
UnionO(α(N))O(N)
ConnectedO(α(N))O(N)
MakeSetO(1)O(N)

The Inverse Ackermann function, O(α(N)), is a time complexity that grows extremely slowly, so much so that for all practical programming purposes, it's almost constant. This means algorithms with this complexity are highly efficient, even for very large input sizes.

Disjoint Sets

A disjoint set, also known as a set of sets, is a collection where each node belongs to exactly one subset, ensuring no overlap between these subsets. Union-Find provides an efficient method to manage these disjoint sets.


One of the key challenges Union-Find addresses is the dynamic nature of disjoint sets. As elements can be grouped or regrouped over time, maintaining an efficient and quick access to these sets becomes difficult, especially in graph algorithms where connectivity changes.


Union-Find, with optimizations such as path compression and union by size, significantly improves the efficiency of these operations. Path compression flattens the structure of the sets, making future find operations faster by directly connecting nodes to their root. Union by size ensures that the smaller set is always merged into the larger one, preventing the formation of long chains that can slow down find operations.

Network Connectivity

Network connectivity problems often ask us to find out whether two elements are part of the same subset or if a path exists between them. Union-Find simplifies this process by maintaining information about connected components within a network.


Union-Find can handle large networks by efficiently grouping nodes into sets of connected components. When a new connection between two nodes is introduced, Union-Find can quickly update the corresponding sets using the union operation. Checking if two nodes are connected is as simple as verifying if they belong to the same set, which can be easily done with the find operation.

Path Compression

Path compression is an optimization technique which reduces the depth of Union-Find. By minimizing the number of hops required to reach the root of each node, path compression speeds up the the find operations.


Just out of curiosity, let's take a look at the Union-Find find operation without path compression. The code below will run at O(n) time complexity:


Path compression works by changing the structure of the Union-Find tree during each find operation. When we need a node's parent from the find operation, path compression ensures that each node along the path to the root is directly linked to the root itself. This adjustment is made dynamically, ensuring that future queries for any of these nodes will require far fewer steps to ascertain their set representative.


Path compression is just a simple modification to the find operation. Instead of traversing up the tree to find the root, the find operation reassigns the parent of each traversed node to point directly to the root. This can be done by adding one simple line to the find function, resulting in a tree that is much flatter and more efficient to work with:


The benefits of path compression are noticeable in large numbers of elements and operations. By ensuring that trees remain shallow, path compression guarantees that the amortized time complexity of find approaches O(log n) in the worst case and is closer to O(α(n)), the inverse Ackermann function, for most practical purposes.

Union by Size

Union by Size is an optimization technique that maintains a balanced tree structure during union operations. This ensures that the tree's height is minimized for future find operations. The size in this context represents the tree's height, and during a union operation, the tree with the smaller size is merged into the tree with the larger size.


Union by Size can be adapted to use a size array for tracking each set's size instead of size. Initially, every element forms a set of size one. During a union operation, the smaller set's root is linked to the larger set's root, optimizing the set's structure for future find operations. If sets are equal in size, one becomes the root of the combined set, which then updates its size to the total of both:


This size-based approach, a variant of the traditional size method, ensures the Union-Find's efficiency, returning True for merging separate sets and False if already connected.

Union-Find Implementation

Implement Union-Find by creating an array to track parent relationships between elements. Use an additional array, often referred to as size, to keep track of the number of elements in each set. This size array plays an important role during the union process:

Best Practices for Union-Find

Union-Find is a great tool for solving problems related to disjoint sets, connectivity, and clustering in coding interviews. To use Union-Find efficiently, here are some great tips and tricks:


Grasp the core concepts of Union-Find, including how it manages disjoint sets, and the significance of its find and union operations. A solid understanding will enable you to apply it to a wide range of problems.

Copyright © StudyDSA. All rights reserved.