Union Find并查集

并查集

一种用来解决集合查询合并的数据结构

假如A、B、C三人在Microsoft工作，D、E、F、G四人在Linkedin工作，给七个人都分发一个工牌，上面写着自己的公司名字，告诉他们自己的老大是哪家公司，则可以表示成如下形式。

如果A遇到F，看一眼对方的工牌，跟自己是不是一个boss，就知道对方是不是跟自己是同一家公司的人了。

如果有一天M公司把L公司收购了，那么此时，需要对两个公司的员工进行合并操作，给员工分发新的工牌，为了减少重新分配的麻烦，就把L的boss指向M，此时L下面的员工最大的boss是M了，那么A和E就在一个阵营了。

如果在M公司三个员工和L公司四个员工中分别选出一个作为该公司的boss，可以表示成如下形式：

那么合并之后，J的boss设置为B，此时大家都是一个阵营的了。

并查集的精髓

一共包含三个操作

初始化

初始化操作中，每个元素的boss指向自己.

HashMap<Integer,Integer> father = new HashMap<integer,integer>();
void initial(int[] nums){
  for(int i = 0 ; i < nums.length;i++){
    father.put(nums[i],nums[i]);
  }
}

查找

查找元素所在的集合，也就是最大的boss。

如果要判断两个点是否属于同一个集合，就看这两个点的boss是否是同一个节点。
int find(int x){
int parent = x;
while(parent!=father.get(parent)){
parent = fater.get(parent);
}
return parent;
}
时间复杂度： $O(n)$
合并

两个不想交的集合，其中一个的大boss认另一个为boss。

找到两个元素的boss，如果不是同一个，就把一个的boss指向另一个的boss。
void union(int x,int y){
int fa_x = find(x);
int fa_y = find(y);
if(fa_x != f_y){
father.put(fa_X,fa_y);//合并两个boss
}
}
时间复杂度： $O(n)$

并查集的优化

baseline的find流程：

如果有这样一条路径：

A-->B-->C-->D-->E-->F

查找A的boss时，需要遍历整个路径，寻找B、C、D时还需要再遍历一次，这显然是大量重复的工作，所以我们可以把一次遍历途中经过的节点都直接指向boss，下次再查询的时候，时间复杂的就是 $O(1)$ 了，这就是带路径压缩的并查集的查找：

平均时间复杂度降至 $O(1)$

并查集模板(c++版)

hash_map实现：

class UnionFind{
    private:
        unordered_map<int,int> father;
    public:
        //初始化并查集
        void initial(vector<int> elements){
            for(int i = 0;i < elements.size();i++){
                father.insert(make_pair(elements[i],elements[i]));
            }    
        }
        //并查集中插入操作，不支持删除
        void add(int x,int x_fa){
            father.insert(make_pair(x,x_fa));
        }
        //在并查集中查找元素的boss
        int findfather(int x){
            int parent = x;
            while(father[parent] != parent){
                parent = father[parent];
            }
            //带路径压缩的并查集
            while(father.find(x)->second != x){
                x = father[x];
                father[x] = parent;
            }
            return parent;
        }
        //合并两个元素
        void unionset(int x,int y){
            int x_father = findfather(x);
            int y_father = findfather(y);
            if(x_father != y_father){
                father[y_father] = x_father;
            }
        }
        //计算并查集中有多少个不想交的子集合
        int countsets(){
            unordered_set<int> father_set;
            for(unordered_map<int,int>::iterator iter = father.begin();iter != father.end();iter++){
                int parent = findfather(iter->first);
                iter->second = parent;
                father_set.insert(parent);
            }
            return father_set.size();
        }
};

vector实现

//用vector定义并查集，index就是元素值
class UnionFind{
    private:
        vector<int> father;
    public:
        //初始化并查集
        void initial(int n){
            father.resize(n,-1);
        }
        //并查集中更新操作
        void fresh(int x,int x_fa){
            father[x] = x_fa;
        }
        //在并查集中查找元素的boss
        int findfather(int x){
            int parent = x;
            while(father[parent] != parent){
                parent = father[parent];
            }
            //带路径压缩的并查集
            while(father[x] != x){
                father[x] = parent;
                x = father[x];
            }
            return parent;
        }
        //合并两个元素
        void unionset(int x,int y){
            int x_father = findfather(x);
            int y_father = findfather(y);
            if(x_father != y_father){
                father[y_father] = x_father;
            }
        }
        //计算并查集中有多少个不想交的子集合
        int countsets(){
            unordered_set<int> father_set;
            for(int i = 0; i < father.size();i++){
                if(father[i]!=-1){
                    int fathertemp = findfather(father[i]);
                    if(fathertemp!=-1){
                        father_set.insert(fathertemp);
                    }
                }
            }
            return father_set.size();
        }
};

用vector会比hashmap快，但如果数据很稀疏，空间复杂度会比较高。

相关题目

leetcode200. Number of Islands

题目

Given a 2d grid map of '1's (land) and '0's (water), count the number of islands. An island is surrounded by water and is formed by connecting adjacent lands horizontally or vertically. You may assume all four edges of the grid are all surrounded by water.

Example 1:
> 11110
> 11010
> 11000
> 00000
>

Answer: 1

Example 2:
> 11000
> 11000
> 00100
> 00011
>

Answer: 3

分析

题目的意思是说矩阵中的1代表陆地，0代表海洋，如果某个1的上或下或左或右也是1的话，就是属于同一片陆地，要求矩阵中陆地的个数。也就是找出矩阵中连接子图的个数。

有两种思路：

并查集

找出矩阵中子图的个数，可以利用并查集，在每个点附近做查找和合并操作，如果其周围有1，就讲其归为一类，最后返回并查集的集合个数就是所求的。

并查集处理二维矩阵时索引比较麻烦，所以这里需要先将二维坐标转化为一维并查集坐标：

矩阵维度：m*n
二维坐标：(i,j) --> 一维坐标：x*n+j
一维坐标：idx     --> 二维坐标：(idx/m,idx%m)

经过坐标转化，可以使用并查集进行计算了

int numIslands(vector<vector<char>>& grid) {
    int rows = grid.size();
    if(rows==0){
        return 0;
    }
    int cols = grid[0].size();
    if(cols==0){
        return 0;
    }
    UnionFind unionfindset;//声明并查集
    vector<vector<int>> island;//存储是1的元素
    vector<vector<bool> > isseen(rows,vector<bool>(cols,0));//记录是否遍历过
    //初始化,插入元素
    for(int i = 0 ; i < rows;i++){
        for(int j = 0 ;j < cols;j++){
            if(grid[i][j]=='1'){
                //father.insert(make_pair(i*cols+j,i*cols+j));
                unionfindset.add(i*cols+j,i*cols+j);
                island.push_back({i,j});
            }
        }
    }
    //遍历边
    for(int i = 0; i < island.size();i++){
        int x_idx = island[i][0];
        int y_idx = island[i][1];
        int idx = x_idx*cols+y_idx;
        if(isseen[x_idx][y_idx]==0){
            isseen[x_idx][y_idx] = 1;
            if(x_idx-1 >= 0 && grid[x_idx-1][y_idx]=='1'){//上
                isseen[x_idx-1][y_idx] = 1;
                unionfindset.unionset(idx-cols,idx);
            }
            if(y_idx-1 >= 0 && grid[x_idx][y_idx-1]=='1'){//左
                isseen[x_idx][y_idx-1] = 1;
                unionfindset.unionset(idx-1,idx);
            }
        }
    }
    return unionfindset.countsets();
}

BFS/DFS

从矩阵中的一个1开始做深度或广度优先遍历，其周围能够遍历到的1都是跟其属于同片陆地的，把遍历过的陆地标记为0，并把count++；然后继续从下一个出现1的地方开始遍历，跟前面的操作一样，最后就可以得到count就是矩阵中陆地的个数。

void bfs(vector<vector<char>>& grid,int x_idx,int y_idx,int& rows,int& cols){
    if(x_idx<0 || x_idx >=rows || y_idx<0 || y_idx >= cols || grid[x_idx][y_idx]=='0')
        return;
    grid[x_idx][y_idx]='0';
    bfs(grid,x_idx,y_idx-1,rows,cols);
    bfs(grid,x_idx-1,y_idx,rows,cols);
    bfs(grid,x_idx,y_idx+1,rows,cols);
    bfs(grid,x_idx+1,y_idx,rows,cols);
}

int numIslands(vector>& grid) {

int rows = grid.size();
if(rows==0){
    return 0;
}
int cols = grid[0].size();
if(cols==0){
    return 0;
}
//unordered_map<int,int> father;//并查集
//vector<vector<int>> island;//存储是1的元素
int count = 0;
for(int i = 0; i < rows;i++){
    for(int j = 0 ; j < cols;j++){
        if(grid[i][j]=='1'){
            count++;
            bfs(grid,i,j,rows,cols);
        }
    }
}
return count;

}


### leetcode305.Number of Islands II 
#### 题目
> A 2d grid map of `m` rows and `n` columns is initially filled with water. We may perform an *addLand* operation which turns the water at position (row, col) into a land. Given a list of positions to operate, **count the number of islands after each addLand operation**. An island is surrounded by water and is formed by connecting adjacent lands horizontally or vertically. You may assume all four edges of the grid are all surrounded by water.
>
> **Example:**
>
> Given `m = 3, n = 3`, `positions = [[0,0], [0,1], [1,2], [2,1]]`.
> Initially, the 2d grid `grid` is filled with water. (Assume 0 represents water and 1 represents land).
>
>

0 0 0
0 0 0
0 0 0

>
> Operation #1: addLand(0, 0) turns the water at grid[0][0] into a land.
>
>

1 0 0
0 0 0 Number of islands = 1
0 0 0

>
> Operation #2: addLand(0, 1) turns the water at grid[0][1] into a land.
>
>

1 1 0
0 0 0 Number of islands = 1
0 0 0

>
> Operation #3: addLand(1, 2) turns the water at grid[1][2] into a land.
>
>

1 1 0
0 0 1 Number of islands = 2
0 0 0

>
> Operation #4: addLand(2, 1) turns the water at grid[2][1] into a land.
>
>

1 1 0
0 0 1 Number of islands = 3
0 1 0

>
> We return the result as an array: `[1, 1, 2, 3]`
一个矩阵，元素都是0，代表海洋，每次选择一个元素标记为1，代表陆地，输出每轮矩阵中的island个数。
#### 思路
这道题目跟上面题目不同的地方在于这次是每次选择一个点进行更新，所以如果每次用DFS/BFS遍历的话，会有大量重复的运算，如果用并查集，则只需要每次对于新加入的island对其周围进行检查然后对并查集进行更新即可。
#### 代码
```c++
//用vector定义并查集，index就是元素值
class UnionFind{
    private:
        vector<int> father;
    public:
        //初始化并查集
        void initial(int n){
            father.resize(n,-1);
        }
        //并查集中更新操作
        void fresh(int x,int x_fa){
            father[x] = x_fa;
        }
        //在并查集中查找元素的boss
        int findfather(int x){
            int parent = x;
            while(father[parent] != parent){
                parent = father[parent];
            }
            //带路径压缩的并查集
            while(father[x] != x){
                father[x] = parent;
                x = father[x];
            }
            return parent;
        }
        //合并两个元素
        void unionset(int x,int y){
            int x_father = findfather(x);
            int y_father = findfather(y);
            if(x_father != y_father){
                father[y_father] = x_father;
            }
        }
        //计算并查集中有多少个不想交的子集合
        int countsets(){
            unordered_set<int> father_set;
            for(int i = 0; i < father.size();i++){
                if(father[i]!=-1){
                    int fathertemp = findfather(father[i]);
                    if(fathertemp!=-1){
                        father_set.insert(fathertemp);
                    }
                }
            }
            return father_set.size();
        }
};
public:
vector<int> numIslands2(int m, int n, vector<pair<int, int>>& positions) {
    int count = 0;
    vector<int> results;
    UnionFind UnionFindset;
    vector<int> x_add = {0,0,1,-1};
    vector<int> y_add = {1,-1,0,0};
    //初始化地图
    vector<vector<int> > mapmatrix(m,vector<int>(n,0));
    for(int i = 0; i < positions.size();i++){
        int x = positions[i].first;
        int y = positions[i].second;
        mapmatrix[x][y] = 1;
        int idx_demension_one = x*n+y;
        UnionFindset.add(idx_demension_one,idx_demension_one);//加入并查集
        count++;
        for(int j = 0 ; j < 4;j++){
            //相邻元素的坐标
            int x_neighbor = x+x_add[j];
            int y_neighbor = y+y_add[j];
            int idx_demension_one_neighbor = x_neighbor*n+y_neighbor;
            //如果相邻元素还在地图中，而且是陆地（val=1）
            if(x_neighbor>=0 && x_neighbor<m && y_neighbor>=0 && y_neighbor<n && mapmatrix[x_neighbor][y_neighbor]==1 && UnionFindset.findfather(idx_demension_one) != UnionFindset.findfather(idx_demension_one_neighbor)){
                UnionFindset.unionset(idx_demension_one,idx_demension_one_neighbor);
                count--;
            }
        }
        results.push_back(count);
    }
    return results;
}

leetcode261. Graph Valid Tree

题目

Given n nodes labeled from 0 to n - 1 and a list of undirected edges (each edge is a pair of nodes), write a function to check whether these edges make up a valid tree.

For example:

Given n = 5 and edges = [[0, 1], [0, 2], [0, 3], [1, 4]], return true.

Given n = 5 and edges = [[0, 1], [1, 2], [2, 3], [1, 3], [1, 4]], return false.

Note: you can assume that no duplicate edges will appear in edges. Since all edges are undirected, [0, 1] is the same as [1, 0]and thus will not appear together in edges.

给定n个节点和节点之间的边集，判断由这些节点和边集是否能够构成树。

分析

给定n个几点和边集构成树的条件有两个：

所有的点都在一个并查集中，也就是都属于一个root节点
不能有环

代码

bool validTree(int n, vector<pair<int, int>>& edges) {
    UnionFind UnionFindgraph;
    UnionFindgraph.initial(n);
    for(int i = 0 ; i < n;i++){
        UnionFindgraph.fresh(i,i);
    }
    //遍历边
    for(int i = 0 ; i < edges.size();i++){
        int start = edges[i].first;
        int end = edges[i].second;
        int start_fa = UnionFindgraph.findfather(start);
        int end_fa = UnionFindgraph.findfather(end);
        if(start_fa == end_fa){
            return false;
        }
        else{
            UnionFindgraph.unionset(start_fa,end_fa);
        }
    }
    if(UnionFindgraph.countsets()==1){
        return true;
    }
    else{
        return false;
    }
}

leetcode323. Number of Connected Components in an Undirected Graph

题目

Given n nodes labeled from 0 to n - 1 and a list of undirected edges (each edge is a pair of nodes), write a function to find the number of connected components in an undirected graph.

Example 1:
>      0          3
>      |          |
>      1 --- 2    4
>
>

Given n = 5 and edges = [[0, 1], [1, 2], [3, 4]], return 2.

Example 2:
>      0           4
>      |           |
>      1 --- 2 --- 3
>
>

Given n = 5 and edges = [[0, 1], [1, 2], [2, 3], [3, 4]], return 1.

分析

判断矩阵中联通子图图的个数，遍历边集，将边首尾节点合并，最终的并查集boss数就是联通子图的个数。

代码

int countComponents(int n, vector<pair<int, int>>& edges) {
    UnionFind UnionFindgraph;
    UnionFindgraph.initial(n);
    for(int i = 0 ; i < n;i++){
        UnionFindgraph.fresh(i,i);
    }
    //遍历边
    for(int i = 0 ; i < edges.size();i++){
        int start = edges[i].first;
        int end = edges[i].second;
        UnionFindgraph.unionset(start,end);
    } 
    return UnionFindgraph.countsets();
}

leetcode130. Surrounded Regions

题目

Given a 2D board containing 'X' and 'O' (the letter O), capture all regions surrounded by 'X'.

A region is captured by flipping all 'O's into 'X's in that surrounded region.

For example,
> X X X X
> X O O X
> X X O X
> X O X X
>
>

After running your function, the board should be:
> X X X X
> X X X X
> X X X X
> X O X X
>
>

将被X围住的O标记为X。

分析

通过分析可知，就是要将所有以O组成、但没有连通到网格边缘的区域变为X。

BFS/DFS：沿着四个边向内找O，找到每一个O就把相连的都变成N，因为他们都是要保留的，最后遍历二维数组，遇到O变成X，遇到N变回O
并查集：将区域内的O合并，组成集合，如果有元素在边界，就将该集合的father设为N，最后遍历所有的0，如果其father为N，就标记为O，否则标记为X。

显然，1会比较快，下面使用BFS实现的.

代码

void bfs(vector<vector<char>>& board,int x,int y,int rows,int cols){
    // if(x<0 || x >= rows || y<0 || y >= cols || board[x][y]=='X'){
    //     return;
    // }
    if(x>=0 && x < rows && y>=0 && y < cols && board[x][y]=='O'){
        board[x][y]='N';
        dfs(board,x-1,y,rows,cols);
        dfs(board,x+1,y,rows,cols);
        dfs(board,x,y-1,rows,cols);
        dfs(board,x,y+1,rows,cols);
    }
}
void solve(vector<vector<char>>& board) { 
    int rows = board.size();
    if(rows==0){
        return;
    }
    int cols = board[0].size();
    if(cols==0){
        return;
    }
    //上下边框
    for(int i = 0 ; i < cols;i++){
        dfs(board,0,i,rows,cols);
        dfs(board,rows-1,i,rows,cols);
    }
    //左右边框
    for(int i = 0 ; i < rows;i++){
        dfs(board,i,0,rows,cols);
        dfs(board,i,cols-1,rows,cols);
    }
    for(int i = 0 ; i < rows;i++){
        for (int j = 0; j < cols;j++){
            if(board[i][j] == 'N'){
                board[i][j] = 'O';
            }
            else if(board[i][j] == 'O'){
                board[i][j] = 'X';
            }
        }
    }
}

总结

什么时候用并查集？

集合合并
判断两个点是否在同一个集合内

trie字典树

Trie字典树

源自单词：retrieve

假设有[b，abc，abd，bcd，abcd，efg，hii ]这6个单词 , 查找abc 在不在字典里面

hash和trie的比较

	hash_table	TIRE树
查找时间复杂度	O(1)	O(1)
空间复杂度		优于hash_table

对于a,aa,aaa,aaaa的情况

	hash	trie
存储	10个a	5个a节点
可用操作	有/无/查询	有/无/前缀查询
	1行	75~100行

所以选择hash原因是代码量小, 但是涉及到前缀查询的时候, 考虑trie树

什么时候更适合用trie树

一个一个字符串遍历的时候。

需要节约空间

查找前缀

Trie模板

例题：

Word search II

扫描线

参考资料

stomachache007的blog