# Union Find Java

It is Monday morning. Lots of work associated things for the day so this post will be direct to the point.

As I have previously mentioned, I am in the process of reading Algorithms in C Third Edition Parts 1 – 4 by Robert Sedgewick. I finished the first chapter and wanted to experiment with the content in the book. Yesterday I got VSCode on my machine to compile and execute. I thought I was ready to use the IDE in C. It seems that when I created a new project something did not work. I am currently using the MinGW compiler with VSCode. The issue might be associated with it. I decided that over the next weekend I will remove MinGW and will switch to the Microsoft C/C++ compiler. I do a considerable amount of work using Visual Studio Enterprise 2019 IDE. Have been using Visual Studio for a couple decades and I like it. VSCode is quite similar but yet different.

The idea is to experiment with the Union-Find algorithm. The algorithm provides a nice and simple approach to determine if two vertices in a map are or are not connected in a simple and efficient algorithm.

```5
0 1
1 2
2 3
3 4
1 4
main <<< N: 5
main <<< 0 -> 1 not connected
main <<< 1 -> 2 not connected
main <<< 2 -> 3 not connected
main <<< 3 -> 4 not connected
main <<< 1 -> 4 connected

10
3 4
4 3
3 4
4 9
8 0
2 3
5 6
2 9
5 9
7 3
4 8
5 6
0 2
6 1
10 1
11 0
-1 5
5 -1
main <<< N: 10
main <<< 3 -> 4 not connected
main <<< 4 -> 3 connected
main <<< 3 -> 4 connected
main <<< 4 -> 9 not connected
main <<< 8 -> 0 not connected
main <<< 2 -> 3 not connected
main <<< 5 -> 6 not connected
main <<< 2 -> 9 connected
main <<< 5 -> 9 not connected
main <<< 7 -> 3 not connected
main <<< 4 -> 8 not connected
main <<< 5 -> 6 connected
main <<< 0 -> 2 connected
main <<< 6 -> 1 not connected
main <<< 10 -> 1 not connected
main <<< 11 -> 0 not connected
main <<< -1 -> 5 not connected
main <<< 5 -> -1 not connected
```

Let’s start by providing a couple examples. I will cover the first and will leave the second for you to make sure it works as expected.

Our test code reads the maximum number of nodes in our graph. In this example we have five nodes. Nodes are labeled with integers starting with 0. In this case nodes are in the range [0 : 4].

After the number of nodes we are given a set of pairs. For each pair we will determine if the nodes are connected. If not, we will connect them. So when we see a line with “0 1” we first check if the node 0 is connected with node 1. It should not so we display that the nodes are not connected and we then connect them. The process repeats for the rest of the input lines. At some point we are provided with the input line “1 4”. We check if node 1 and 4 are connected. They are connected so we display that the nodes are connected.

```    /**
* Test scaffold.
*
* @throws IOException
*/
public static void main(String[] args) throws IOException {

// **** open buffered reader ****

// ???? ????
System.out.println("main <<< N: " + N);

// **** initialize object ****
UnionFindJava obj = new UnionFindJava(N);

// **** loop reading p and q ****
boolean nextLine = true;
while (nextLine) {

// **** read next input line ****
.mapToInt(Integer::parseInt)
.toArray();

// **** extract p and q from the int[] ****
int p = pq[0];
int q = pq[1];

// **** determine if p & q are connected ****
boolean connected = obj.areConnected(p, q);

// ???? ????
System.out.println("main <<< " + p + " -> " + q +
(connected ? " connected" : " not connected"));

// **** connect p -> q (if needed) ****
if (!connected) {
obj.connect(p, q);
}

// **** check if br has a next line to read ****
}

// **** close buffered reader ****
br.close();
}
```

Let’s take a look at the test code.

We open a buffered reader and read the number of nodes in our graph. The value is displayed. We initialize an object which we will use to determine if a pair of nodes is or is not connected returning true or false respectively.

If the nodes are not connected then we connect them.

We then check if we have additional input data or if we are done.

Before leaving the code, we close the buffered reader. In this example it is not required, but in production it is always a good practice to close data streams to make sure we do not induce side effects.

```public class UnionFindJava {

// **** class members ****
public int N;
public int[] id;

// .... methods follow
}
```

Our class UnionFindJava contains two class members. One indicates the number of nodes in our graph. The other is an array of ids for our nodes. This implementation does not require a two dimensional adjacency matrix. This makes this algorithm quire efficient and does not require a large two dimensional array.

```    /**
* Constructor.
*/
public UnionFindJava(int N) {
this.N = N;
this.id = new int[N];
for (int i = 0; i < N; i++)
this.id[i] = i;
}
```

The constructor declares and initializes an int[] array with monotonically ascending integers starting at 0.

```    /**
* Determine if p and q are or are not connected.
*/
public boolean areConnected(int p, int q) {

// **** sanity check(s) ****
if (p < 0 || p >= N || q < 0 || q >= N) {
return false;
}

// **** check if p & q are connected ****
if (this.id[p] == this.id[q])
return true;

// **** p & q are NOT connected ****
return false;
}
```

This method is used to determine if the specified nodes are nor are not connected. We just check if the values in the id array associated with p and q are equal. If so the nodes are connected; otherwise they are not. Please take a few moments to think how the check would work.

```    /**
* Find if there is a connection between nodes p & q.
* This call will make a new connection between p & q if needed.
* Worse case O(n);
*/
public void connect(int p, int q) {

// **** sanity check(s) ****
if (p < 0 || p >= N || q < 0 || q >= N) {
return;
}

// **** check if p & q are already connected ****
if (this.id[p] == this.id[q]) {
return;
}

// **** make a new connection between p & q ****
int i = 0;
for (int t = this.id[p]; i < this.N; i++) {
if (this.id[i] == t)
this.id[i] = this.id[q];
}
}
```

This method is used to connect the specified two nodes.

We start by performing some sanity checks. If the two nodes are connected we just return since there is nothing else to do.

The core and cleverness of the algorithm relies in the following loop. We start with the first node and traverse all nodes in the array. Note that the variable t holds the index of the p node in the array. When an entry in the id array is set to t, we update the entry to the value of the id for node q.

Use a piece of paper and pen to follow the algorithm. You can also display the contents of the array after each connection.

This is a very simple and powerful algorithm that is quite simple to implement and runs in O(n) worse time.

Hope you enjoyed solving this problem as much as I did. The entire code for this project can be found in my GitHub repository.

If you have comments or questions regarding this, or any other post in this blog, or if you would like for me to help out with any phase in the SDLC (Software Development Life Cycle) of a project associated with a product or service, please do not hesitate and leave me a note below. If you prefer, send me a private e-mail message. I will reply as soon as possible.

Keep on reading and experimenting. It is one of the best ways to learn, become proficient, refresh your knowledge and enhance your developer toolset.

One last thing, many thanks to all 7,118 subscribers to this blog!!!

Keep safe during the COVID-19 pandemic and help restart the world economy. I believe we can all see the light at the end of the tunnel.

Regards;

John

john.canessa@gmail.com

This site uses Akismet to reduce spam. Learn how your comment data is processed.