Earlier today I finished reading chapter 1.5 Case Study: Union-Find in the Algorithms Fourth Edition book by Robert Sedgewick and Kevin Wayne published by Addison-Wesley. Not sure if the authors recommend reading the chapters in order because they tend to use previous material. Last week I was browsing the table of contents and decided to read chapter 4.1. It mentions contents of chapter 1.5. From now on, **I will read the book in order** ;o)

It is always good to work on a single topic. I am breaking that rule to some extent. I will cover two very similar implementations of the Union-Find algorithm from chapter 1.5. It does give an introductory flavor for what is to come in chapter 4.1 Undirected Graphs.

This chapter covers the data structure and associated algorithm to determine if a set of components are connected due to a union operation. Always start with an API and test code. The API for this algorithm is defined in page 219 and summarized as follows:

Method |
Description |

UF(int N) | Initialize N sites with integer names (0 to N – 1) |

void union(int p, int q) | Add connection between p and q |

int find(int p) | Find component identifier for p (0 to N – 1) |

boolean connected(int p, int q) | Return true if p and q are in the same component |

int count() | Number of components |

When solving a problem always think on the algorithm and the data structure to use. They tend to go hand in hand as is illustrated in this chapter.

The chapter covers in incremental order, how the algorithm is refined and the data structure updated to support it. In this blog I skip the first implementation (quick-find) and jump to the second (quick-union) and third (weighted quick-union) ones. If interested, I strongly recommend getting and reading the book.

As I mentioned, I typically like to adhere to a single concept per blog post. What motivated me to divert from this approach was that after implementing quick-find I reused the code and implemented quick-union. At that point I decided to copy and paste it in the test code and implement the weighted quick-union. To compare them I decided to use the Adapter Design Pattern. Sorry about this. I will avoid it in the future ;o) I took a quick refresher at the Adapter design pattern on-line and read the associated topic (Adapter page 139 in chapter 4 Structural Patterns in Design Patterns – Elements of Reusable Object-Oriented Software by Erich Gamma et al). I like to keep handy this Design Patterns book.

Following are the results of the two last implementations of Union-Find as displayed in the console of the Eclipse IDE:

main <<< N: 625

main <<< elapsedTime: **31 ms**

main <<< count: 3 components

main <<< N: 625

main <<< elapsedTime: **10 ms**

main <<< count: 3 components

The input file found in the associated Algorithms book web site contains 625 sites (numbers from 0 to 624). The first algorithm with the same data takes 31 ms to execute. The second algorithm takes 10 ms to process the same data. I understand that a single run should never be used, but in this case it seems to run about three times faster. In general, at a minimum run five passes. Drop the fastest and the slowest. Average the remaining three.

The following output is when a larger sample was used:

main <<< N: 1000000

main <<< elapsedTime: **136074 ms**

main <<< count: 780805 components

main <<< N: 1000000

main <<< elapsedTime: **688 ms**

main <<< count: 780805 components

The first algorithm takes over two minutes while the second one takes less than one second!!! That is a huge improvement. The book talks about the results and how to interpret them. It is worthwhile reading it to better understand the implications of algorithm development and associated data structures. Algorithms like this took many computer scientists many years to define and refine.

The Java code for the test follows:

**package** edu.princeton.cs.algs4;

**import** java.io.File;

**import** java.io.FileNotFoundException;

**import** java.io.IOException;

**import** java.util.Scanner;

**public** **class** Solution {

/*

* test module

*/

**public** **static** **void** main(String[] args) **throws** IOException {

// **** set the name for the file ****

// final String fileName = “c:\\__temp__\\tinyUF.txt”;

**final** String fileName = “c:\\temp\\mediumUF.txt”;

// final String fileName = “c:\\__temp__\\largeUF.txt”;

// **** check if the file is present and is not a directory ****

File file = **new** File(fileName);

**if** (!file.exists() || file.isDirectory()) {

**throw** **new** FileNotFoundException(“main <<< fileName ==>” + fileName + “<==”);

}

**// **** open the file [1] ******

Scanner sc = **new** Scanner(file);

// **** read the number of components ****

**int** N = sc.nextInt();

System.** out**.println(“main <<< N: ” + N);

// **** ****

**long** startTime = System.*currentTimeMillis*();

UnionFindQuick uf = **new** UnionFindQuick(N);

**while** (sc.hasNext()) {

**int** p = sc.nextInt();

**int** q = sc.nextInt();

// **** if connected (skip) ****

**if** (uf.connectedQ(p, q)) {

**continue**;

}

// **** combine components ****

uf.unionQ(p, q);

}

**long** endTime = System.*currentTimeMillis*();

// **** close the file ****

sc.close();

// **** ****

System.** out**.println(“main <<< elapsedTime: ” + (endTime – startTime) + ” ms”);

System.** out**.println(“main <<< count: ” + uf.countQ() + ” components”);

**// **** open the file [2] ******

sc = **new** Scanner(file);

// **** read the number of components ****

N = sc.nextInt();

System.** out**.println(“main <<< N: ” + N);

// **** ****

startTime = System.*currentTimeMillis*();

UnionFindWeight ufw = **new** UnionFindWeight(N);

UnionFindAdapter ufa = **new** UnionFindAdapter(ufw);

**while** (sc.hasNext()) {

**int** p = sc.nextInt();

**int** q = sc.nextInt();

// **** if connected (skip) ****

**if** (ufa.connectedQ(p, q)) {

**continue**;

}

// **** combine components ****

ufa.unionQ(p, q);

}

endTime = System.*currentTimeMillis*();

// **** close the file ****

sc.close();

// **** ****

System.** out**.println(“main <<< elapsedTime: ” + (endTime – startTime) + ” ms”);

System.** out**.println(“main <<< count: ” + ufa.countQ() + ” components”);

}

}

The data file is opened and close twice. In between, the same set of steps is performed.

The following file implements the Union-Find Quick algorithm and data structure:

**package** edu.princeton.cs.algs4;

**public** **class** UnionFindQuick {

/*

* members

*/

**private** **int** id[];

**private** **int** count;

/*

* constructor

*/

**public** UnionFindQuick(**int** N) {

**this**.count = N;

**this**.id = **new** **int**[N];

**for** (**int** i = 0; i < N; i++) {

**this**.id[i] = i;

}

}

/*

* add connection between p and q

*/

**public** **void** unionQ(**int** p, **int** q) {

**int** pID = findQ(p);

**int** qID = findQ(q);

// **** check if p and q are in the same component ****

**if** (pID == qID) {

**return**;

}

// **** change value of component (arbitrary) ****

**for** (**int** i = 0; i < id.length; i++) {

**if** (id[i] == pID) {

id[i] = qID;

}

}

// **** decrement count of components ****

count–;

}

/*

* component identifier for p (0 to N – 1)

*/

**public** **int** findQ(**int** p) {

**return** **this**.id[p];

}

/*

* return true if p and q are in the same component

*/

**public** **boolean** connectedQ(**int** p, **int** q) {

**return** findQ(p) == findQ(q);

}

/*

* number of components

*/

**public** **int** countQ() {

**return** **this**.count;

}

}

The following source code implements the Weighted Union-Find algorithm:

**package** edu.princeton.cs.algs4;

**public** **class** UnionFindWeight {

/*

* members

*/

**private** **int** id[];

**private** **int** sz[];

**private** **int** count;

/*

* constructor

*/

**public** UnionFindWeight(**int** N) {

**this**.count = N;

**this**.id = **new** **int**[N];

**for** (**int** i = 0; i < N; i++) {

**this**.id[i] = i;

}

**this**.sz = **new** **int**[N];

**for** (**int** i = 0; i < N; i++) {

sz[i] = 1;

}

}

/*

* add connection between p and q

*/

**public** **void** unionW(**int** p, **int** q) {

**int** i = findW(p);

**int** j = findW(q);

// **** check if p and q are in the same component ****

**if** (i == j) {

**return**;

}

// **** make smaller root point to larger one ****

**if** (sz[i] < sz[j]) {

id[i] = j;

sz[j] += sz[i];

} **else** {

id[j] = i;

sz[i] += sz[j];

}

// **** decrement count of components ****

count–;

}

/*

* component identifier for p (0 to N – 1)

*/

**public** **int** findW(**int** p) {

// **** go up the tree to the root ****

**while** (p != id[p]) {

p = id[p];

}

// **** return the root ****

**return** p;

}

/*

* return true if p and q are in the same component

*/

**public** **boolean** connectedW(**int** p, **int** q) {

**return** findW(p) == findW(q);

}

/*

* number of components

*/

**public** **int** countW() {

**return** **this**.count;

}

}

Last but not least, here is the implementation for the UnionFindAdapter class:

**package** edu.princeton.cs.algs4;

**public** **class** UnionFindAdapter {

// **** ****

UnionFindWeight ufw;

/*

* constructor

*/

**public** UnionFindAdapter(UnionFindWeight ufw) {

**this**.ufw = ufw;

}

/*

*

*/

**public** **void** unionQ(**int** p, **int** q) {

ufw.unionW(p, q);

}

/*

*

*/

**public** **int** findQ(**int** p) {

**return** ufw.findW(p);

}

/*

*

*/

**public** **boolean** connectedQ(**int** p, **int** q) {

**return** ufw.connectedW(p, q);

}

/*

*

*/

**public** **int** countQ() {

**return** ufw.countW();

}

}

I also put together the following diagram that illustrates the use and implementation of the Adapter Design Pattern:

If you have comments or questions regarding this or any other post in this web site, please do not hesitate and send me an email message.

John

john.canessa@gmail.com