Prim builds a minimum spanning tree by growing one blob outward. Kruskal gets the identical result from the opposite direction: sort all the edges once, then sweep them cheapest-first, keeping every edge that connects two separate fragments and rejecting every edge that would close a cycle. The work-in-progress is a forest of islands that merge into one tree, and the total cost is O(E log E), almost all of it the sort.
The correctness comes from the same cut property that licenses Prim: each accepted edge is the lightest one crossing the cut between its two fragments and everything else, so the greed is safe at every pick. What's genuinely new here is the machinery for the cycle check.
Run it on the same five-node graph from the Prim page: A-B = 2, B-C = 3, B-E = 5, A-D = 6, C-E = 7, B-D = 8, D-E = 9, already listed in sorted order. The sweep takes A-B, takes B-C, takes B-E, and takes A-D, at which point four edges connect all five nodes. Predict the fate of C-E at 7 before reading on.
Rejected: C and E already live in the same fragment (connected through B), so the edge would close the cycle C-B-E-C and add weight for nothing. The final tree weighs 2 + 3 + 5 + 6 = 16, the exact tree Prim built on this graph, which is no accident: when all edge weights are distinct, the minimum spanning tree is unique, and every correct algorithm must find it.
The sweep asks one question per edge: are these two endpoints already connected through the fragments built so far? Walking the forest to answer would cost O(V) per edge. Union-find answers it in effectively constant time: every node points toward its fragment's root, two nodes share a fragment exactly when they share a root, and merging fragments is one pointer assignment between roots.
The speed depends on keeping those pointer chains short, which is what path compression does: while walking up to find a root, repoint each visited node higher up the chain. With compression in place, the amortized cost per operation involves the inverse Ackermann function, a value below 5 for any input that fits in this universe, and the cycle checks effectively vanish from the complexity budget.
def kruskal(n: int, edges: list[tuple[int, int, int]]) -> int:
# edges hold (weight, node_a, node_b); returns MST weight or -1
parent = list(range(n))
def find(node: int) -> int:
# Path compression: point at the grandparent while walking up
while parent[node] != node:
parent[node] = parent[parent[node]]
node = parent[node]
return node
total = 0
used = 0
# Cheapest bridges first
for weight, a, b in sorted(edges):
root_a, root_b = find(a), find(b)
# Same root means this edge would close a cycle
if root_a == root_b:
continue
parent[root_a] = root_b
total += weight
used += 1
# A spanning tree is complete at n - 1 edges
if used == n - 1:
return total
return -1You might think one of the two must dominate the other, since they solve the identical problem. They split the territory by input shape instead. Kruskal wants a flat edge list and shines on sparse graphs, since its cost follows the edge count. Prim wants adjacency structure and shines on dense graphs, where it skips sorting E ≈ V² edges. With distinct weights they return the same tree, with ties they may differ edge-by-edge, and the total weight always matches to the digit.
One boundary they share: both assume undirected edges. Both also assume connectivity, and Kruskal's failure mode is the more graceful one, since its union counter ends below V - 1 and announces the graph was never connectable.
Sort, sweep, union. The details that make it correct:
O(E log E). Everything after it is a single cheap pass, so resist any urge to re-sort, re-scan, or use a heap.V - 1 edges, so the moment the counter gets there, every remaining edge is a guaranteed reject and the loop can exit early.The MST problem set runs on costumes over the same body. Connecting Cities is the formula plus a -1 disconnection check. Optimize Water Distribution hides the formula behind wells: add a virtual node 0 where building a well in a house becomes an edge from that house to the virtual source, and the choice between wells and pipes resolves itself in the sweep. Find Critical and Pseudo-Critical Edges runs the algorithm repeatedly, once excluding and once force-including each edge, and reads each edge's role off the resulting totals.
Past MSTs, the sort-then-union sweep is its own reusable pattern: any time events should merge groups in cost or time order, Kruskal's skeleton with a different sort key does the job.
Five habits, most of them about the union-find underneath:
O(n). Path compression (pointing nodes at their grandparents while walking up) keeps finds effectively constant, and it costs one line.Two checks first: what does sharing a union-find root prove about two nodes, and why is every edge after union number V - 1 a guaranteed reject? Same problem set as Prim by design, so solve each one with both algorithms and feel where each shines:
Completed | Title | Status | Notes | Solution | |
|---|---|---|---|---|---|
Medium |