Radix Sort

Earlier this week I ran into a description of Radix Sort. This sorting algorithm has been around for a few centuries (yes; that is not a typo). The algorithm dates back to 1887 to the work of Herman Hollerith (and yes; he was the inventor of the Hollerith Card Code for punched cards used in the past century).

This sorting algorithm is not the fastest, it requires additional space, but has been around for a long time. When you read about it, seems like it should not work; but it does.

A good test is to take a set of cards and separate them by suit. Then you take a suit and put the cards in the proper ascending order. When done the entire deck is properly sorted.

Let’s take a look at the following screen capture from the console in the Java Eclipse Oxygen.3a IDE:

8 41 83 72 87 54 99 26 51 30 
20 6 1 39 83 19 84 11 84 35 
9 78 81 95 70 87 93 2 17 72 
47 26 1 92 48 44 82 35 45 86 
45 63 17 68 30 67 82 20 64 54 

The screen capture shows a set of fifty random numbers in no particular order. This illustrates the starting array. The top left position shows the value of the first element (position 0) in the array and the bottom right shows the last value (position 49).

The radix sort is implemented using just two methods. The first takes the elements in the array an puts it in ten queues. Note that the numbers ending in zero (0) are inserted in the first queue, the numbers ending in one (1) in the next queue. The progression continues until the last queue. This is illustrated with the following screen capture:

0 : 30 20 70 30 20 
1 : 41 51 1 11 81 1 
2 : 72 2 72 92 82 82 
3 : 83 83 93 63 
4 : 54 84 84 44 64 54 
5 : 35 95 35 45 45 
6 : 26 6 26 86 
7 : 87 87 17 47 17 67 
8 : 8 78 48 68 
9 : 99 39 19 9 

The first number indicates the queue (e.g., “0: “) followed by the entries. As you can see, all numbers in the array that end in zero (0) are inserted into the first queue. The second queue one holds all numbers in the array that end with a one (1). The process repeats.

The next step is to take the contents of the queues in ascending order (queue 0, queue 1, …) puting them one after the other in the initial array.  This is illustrated by the following screen capture:

30 20 70 30 20 41 51 1 11 81 
1 72 2 72 92 82 82 83 83 93 
63 54 84 84 44 64 54 35 95 35 
45 45 26 6 26 86 87 87 17 47 
17 67 8 78 48 68 99 39 19 9 

As you can see, the numbers ending in zero (0) are followed by the numbers ending in one (1). The process repeats until the numbers ending in nine (9) are inserted back into the array.

Now the process repeats with the next digit to the left. This is the positions of the tenths. This is illustrated by the following screen capture:

0 : 1 1 2 6 8 9 
1 : 11 17 17 19 
2 : 20 20 26 26 
3 : 30 30 35 35 39 
4 : 41 44 45 45 47 48 
5 : 51 54 54 
6 : 63 64 67 68 
7 : 70 72 72 78 
8 : 81 82 82 83 83 84 84 86 87 87 
9 : 92 93 95 99 

As you can see the numbers in each queue are sorted by ascending order. The last step is to put them together back in the array. This is illustrated by the following screen capture:

1 1 2 6 8 9 11 17 17 19 
20 20 26 26 30 30 35 35 39 41 
44 45 45 47 48 51 54 54 63 64 
67 68 70 72 72 78 81 82 82 83 
83 84 84 86 87 87 92 93 95 99 

Since the random numbers are in the range [0:99] the algorithm ends (we are using using two-digit numbers). All the random numbers in the array are now sorted in ascending order. This algorithm follows the KISS (Keep It Simple and Short) rule.

Let’s take a look at the Java source code for the radix sort algorithm:

import java.util.LinkedList;

/*
 * Each queue holds numbers in the range [0:9], [10:19], ... , [90, 99]
 */
public class Solution {
	
	/*
	 * Short keys come before longer keys (<== or LSD)
	 */
	static void distribute(int base, int digit, int[] arr, LinkedList<Integer>[] queues) {
		int n	= 0;

		// **** traverse the array ****
		for (int i = 0; i < arr.length; i++) {

			// **** compute the queue number ****
			n = arr[i];
			for (int j = 0; j < digit; j++) {
				n /= base;
			}
			n %= base;
			
			// **** add number from array into the proper queue ****
			queues[n].addLast(arr[i]);
		}
	}
	
	/*
	 * Keys of the same length are sorted lexicographically.
	 */
	static void collect(int[] arr, LinkedList<Integer>[] queues) {
		
		int j = 0;
		
		// **** traverse all queues ****
		for (int i = 0; i < queues.length; i++) {
			
			// **** traverse this queue ****
			while (!queues[i].isEmpty()) {
				arr[j++] = queues[i].removeFirst();
			}
		}
	}
	
	/*
	 * 
	 */
	static void displayArray(int[] arr) {
		int i	= 0;
		while (i < arr.length) {
			System.out.print(arr[i] + " ");
			if ((++i % 10) == 0) {
				System.out.println();
			}
		}
		System.out.println();
	}
	
	/*
	 * display the contents of the queues
	 */
	static void displayQueues(LinkedList<Integer>[] queues) {
		for (int i = 0; i < queues.length; i++) {
			System.out.print(i + " : ");
			for (int j = 0; j < queues[i].size(); j++) {
				System.out.print(queues[i].get(j) + " ");
			}
			System.out.println();
		}
		System.out.println();
	}
	
	/*
	 * Radix sort is a non-comparative integer sorting algorithm 
	 * that sorts data with integer keys by grouping keys by the 
	 * individual digits which share the same significant position 
	 * and value.
	 * 
	 * Radix sorts can be implemented to start at either the most 
	 * significant digit (MSD) or least significant digit (LSD).
	 * For example, when sorting the number 1234 into a list, one
	 * could start with the 1 or the 4.
	 */
	public static void main(String[] args) {

		// **** constants ****
		final int BASE		= 10;
		final int DIGITS	= 2;
		final int MAX_VAL	= (int)Math.pow(BASE,  DIGITS);
		final int N			= 50;

		// **** initialize the array of integer queues ****
		LinkedList<Integer>[] queues = new LinkedList[BASE];
		for (int i = 0; i < BASE; i++) {
			queues[i] = new LinkedList<Integer>();
		}
		
		// **** allocate, populate and display array of random numbers ****
		int[] numbers = new int[N];
		for (int i = 0; i < N; i++) {
			numbers[i] = (int)(Math.random() * MAX_VAL);
		}
		displayArray(numbers);

		// **** radix sort ****
		for (int digit = 0; digit < DIGITS; digit ++) {
			distribute(BASE, digit, numbers, queues);
			displayQueues(queues);
			collect(numbers, queues);
			displayArray(numbers);
		}
	
	}

}

The last few lines in the main() method indicates how the sort algorithm is implemented with only two (2) methods; distribute() and collect(). The displayQueues() and displayArray() are just convenience methods to display the contents of the array and queues during the sorting process.

The first pass of the algorithm addresses the least significant digits. The second pass the tenths. The process repeats as long as there are more digits to sort. In our code this behaviour is controlled by the constants in the main() method.

You can experiment by sorting more or less numbers with larger or smaller values. For example, let’s change the settings to:

// **** constants ****
final int BASE		= 10;
final int DIGITS	= 4;
final int MAX_VAL	= (int)Math.pow(BASE,  DIGITS);
final int N			= 20;

After compiling with the updated constants, the entire console  output is illustrated by the following screen capture:

9394 1088 394 849 1145 247 6087 2064 1390 6210 
6389 8853 3915 633 8530 5391 5336 1843 8470 9935 

0 : 1390 6210 8530 8470 
1 : 5391 
2 : 
3 : 8853 633 1843 
4 : 9394 394 2064 
5 : 1145 3915 9935 
6 : 5336 
7 : 247 6087 
8 : 1088 
9 : 849 6389 

1390 6210 8530 8470 5391 8853 633 1843 9394 394 
2064 1145 3915 9935 5336 247 6087 1088 849 6389 

0 : 
1 : 6210 3915 
2 : 
3 : 8530 633 9935 5336 
4 : 1843 1145 247 849 
5 : 8853 
6 : 2064 
7 : 8470 
8 : 6087 1088 6389 
9 : 1390 5391 9394 394 

6210 3915 8530 633 9935 5336 1843 1145 247 849 
8853 2064 8470 6087 1088 6389 1390 5391 9394 394 

0 : 2064 6087 1088 
1 : 1145 
2 : 6210 247 
3 : 5336 6389 1390 5391 9394 394 
4 : 8470 
5 : 8530 
6 : 633 
7 : 
8 : 1843 849 8853 
9 : 3915 9935 

2064 6087 1088 1145 6210 247 5336 6389 1390 5391 
9394 394 8470 8530 633 1843 849 8853 3915 9935 

0 : 247 394 633 849 
1 : 1088 1145 1390 1843 
2 : 2064 
3 : 3915 
4 : 
5 : 5336 5391 
6 : 6087 6210 6389 
7 : 
8 : 8470 8530 8853 
9 : 9394 9935 

247 394 633 849 1088 1145 1390 1843 2064 3915 
5336 5391 6087 6210 6389 8470 8530 8853 9394 9935 

Note that not all queues are used. The final line illustrates the output of the sorted array.

Take the time to understand what goes on. Use different values. Experiment with this algorithm. Remember that If you can’t explain something in simple terms, you don’t understand it (Richard Feynman).

In the following posts I will be addressing the NoSQL document database mongoDB. Currently using it at work to store large amounts of data of different kinds.

If you have comments or questions regarding this or any other post in this blog, please leave me a message below. I will reply and soon as possible and would keep the comments in private by sending an email, if you so desired.

John

john.canessa@gmail.com

@john_canessa

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.