Union-Find (DSU)

Senior

Disjoint-set structure with path compression + union by rank for near-O(1) component queries.

When to use

Connected components offline, Kruskal's MST, redundant connection, accounts merge.

Complexity

time O(α(n)) per op (effectively O(1))space O(n)

Path compression and union by rank give an amortized cost of inverse Ackermann α(n), which is at most 4 for any practical input. The parent and rank arrays hold one entry per element.

union(x,y): point one root at the other. find(x) walks up with path compression.

How it works

Union-Find maintains a forest of disjoint sets where each set has a representative root. Two operations: find(x) returns x's root, union(x, y) merges the sets containing x and y. With path compression (find flattens the tree as it walks) and union by rank/size (always attach the smaller tree under the larger), both operations run in O(alpha(n)) — inverse Ackermann, effectively constant for any realistic n.

The pattern is the right tool for offline connectivity: when you process a batch of merges and ask 'are x and y connected?' after each, union-find beats breadth-first search (BFS) or depth-first search (DFS) by orders of magnitude. It's also the engine behind Kruskal's minimum spanning tree (MST) algorithm (sort edges, union as you add).

What union-find can't do efficiently is split — once merged, sets stay merged. For problems that involve removal (like 'minimum cost to disconnect'), you typically have to reverse-process: start from the end state and add edges in reverse, treating deletions as unions.

Key insight

Path compression plus union by rank gives O(alpha(n)) amortized per operation — effectively constant — making union-find the right choice for any batch of merge/connectivity queries on a static or grow-only graph.

Common pitfalls

Implementing find without path compression, degrading to O(log n) or worse per operation and timing out on large inputs.
Implementing union without rank/size and always attaching one fixed direction, building a degenerate linked list.
Comparing parent[x] == y instead of find(x) == find(y) to test connectivity, missing transitive connections.
Trying to use union-find for problems that require split or removal — wrong tool, switch to offline reverse processing or different DS.
In Kruskal's, forgetting to sort edges by weight before iterating — produces a spanning tree, but not minimum.

Variations

Standard disjoint-set union (DSU) with path compression + union by rank.
Kruskal's minimum spanning tree (MST): sort edges, union if endpoints disconnected.
Weighted/relational union-find: track relative offsets between nodes (e.g. accounts merge, equations).
Offline reverse processing: handle edge deletions by reversing the timeline into edge additions.
Number of islands II / dynamic islands: start empty, add cells one by one with unions.

Template

class DSU {
    int[] parent, rank;

    DSU(int n) {
        parent = new int[n];
        rank = new int[n];
        for (int i = 0; i < n; i++) parent[i] = i;
    }

    int find(int x) {
        if (parent[x] != x) parent[x] = find(parent[x]);
        return parent[x];
    }

    boolean union(int a, int b) {
        int ra = find(a), rb = find(b);
        if (ra == rb) return false;
        if (rank[ra] < rank[rb]) { int t = ra; ra = rb; rb = t; }
        parent[rb] = ra;
        if (rank[ra] == rank[rb]) rank[ra]++;
        return true;
    }
}

Problems using this pattern

Easiest first · Blind 75 above NeetCode 150

Practice these problems

Solve mode — formulate the approach, then reveal. 2 problems.