Merge K Sorted Arrays

Good morning! It is once again Saturday morning during the COVID-19 pandemic. Today is March 13, 2021 and the sun is shining in the Twin Cities of Minneapolis and St. Paul. The forecast calls for the temperature to get up to 58F. For this time of the year it is welcomed. Tomorrow it will be a little cooler and daylight savings time starts at 02:00 AM when the clocks spring forward to 03:00 AM. Tomorrow we lose one hour.

A week or so ago I was talking with an engineer about merging a set of sorted arrays. I also generated the post Merge Sorted Array which probably motivated the exchange. If you take a look at the last post, the idea was to merge two sorted arrays. The obvious next step is to consider multiple sorted arrays of the same or different lengths. In that post we considered two approaches. In the first one we add all the elements from both arrays into a priority queue. We then populate the result array with the elements being removed from the priority queue.

In the second approach we pull the elements from the two sorted arrays taking into consideration the values and inserting them into the result array.

We did not consider the approach of copying the elements of the two arrays to the results array and then sorting the resulting array. Note that all three approaches provide a solution but the executions times are somewhat different.

In this post we will explore sorting a set of K arrays with the same or different sizes. In my opinion the ability of sorting arrays with different sizes seems a better choice because the same code would be able to sort all arrays being of the same length.

We will use the Java programming language on a Windows 10 computer and the VSCode IDE. The nice thing of Java is that your code should be able to run on most (avoid generalizing) platform unchanged.

Given k sorted arrays, merge them into a single sorted array.

The arrays may or may not be of the size.

The requirements are simple. We are given a set of K sorted arrays and we are asked to generate a new array holding the same values but sorted. Note that the arrays may or may not be of the same size. AS we have discussed multiple times, the purpose of this blog is to read and learn. Memorizing solutions does not seem to help much when addressing similar problems.

3
1, 4, 7
2, 5, 8
3, 6, 9
main <<<       n: 3
main <<< minSize: 3
main <<< maxSize: 3
main <<< difSize: false
main <<<    arrs:
[1, 4, 7]
[2, 5, 8]
[3, 6, 9]
main <<< mergeSameLength: [1, 2, 3, 4, 5, 6, 7, 8, 9]
main <<<        copySort: [1, 2, 3, 4, 5, 6, 7, 8, 9]
main <<<          merge1: [1, 2, 3, 4, 5, 6, 7, 8, 9]
main <<<          merge2: [1, 2, 3, 4, 5, 6, 7, 8, 9]


3
5, 7, 15, 18
1, 8, 9, 17
1, 4, 7, 7
main <<<       n: 3
main <<< minSize: 4
main <<< maxSize: 4
main <<< difSize: false
main <<<    arrs:
[5, 7, 15, 18]
[1, 8, 9, 17]
[1, 4, 7, 7]
main <<< mergeSameLength: [1, 1, 4, 5, 7, 7, 7, 8, 9, 15, 17, 18]
main <<<        copySort: [1, 1, 4, 5, 7, 7, 7, 8, 9, 15, 17, 18]
main <<<          merge1: [1, 1, 4, 5, 7, 7, 7, 8, 9, 15, 17, 18]


2
1, 2, 3, 4
4, 5, 6, 7
main <<<       n: 2
main <<< minSize: 4
main <<< maxSize: 4
main <<< difSize: false
main <<<    arrs:
[1, 2, 3, 4]
[4, 5, 6, 7]
main <<< mergeSameLength: [1, 2, 3, 4, 4, 5, 6, 7]
main <<<        copySort: [1, 2, 3, 4, 4, 5, 6, 7]
main <<<          merge1: [1, 2, 3, 4, 4, 5, 6, 7]
				   

3
1, 2, 3, 4
4, 5, 6, 7
7, 8, 9, 10
main <<<       n: 3
main <<< minSize: 4
main <<< maxSize: 4
main <<< difSize: false
main <<<    arrs:
[1, 2, 3, 4]
[4, 5, 6, 7]
[7, 8, 9, 10]
main <<< mergeSameLength: [1, 2, 3, 4, 4, 5, 6, 7, 7, 8, 9, 10]
main <<<        copySort: [1, 2, 3, 4, 4, 5, 6, 7, 7, 8, 9, 10]
main <<<          merge1: [1, 2, 3, 4, 4, 5, 6, 7, 7, 8, 9, 10]


4
1, 3, 5, 7, 9, 17
2, 5, 8, 10, 11, 15
3, 6, 8, 9, 12, 18
11, 15, 17, 20, 30, 40
main <<<       n: 4
main <<< minSize: 6
main <<< maxSize: 6
main <<< difSize: false
main <<<    arrs:
[1, 3, 5, 7, 9, 17]
[2, 5, 8, 10, 11, 15]
[3, 6, 8, 9, 12, 18]
[11, 15, 17, 20, 30, 40]
main <<< mergeSameLength: [1, 2, 3, 3, 5, 5, 6, 7, 8, 8, 9, 9, 10, 11, 11, 12, 15, 15, 17, 17, 18, 20, 30, 40]       
main <<<        copySort: [1, 2, 3, 3, 5, 5, 6, 7, 8, 8, 9, 9, 10, 11, 11, 12, 15, 15, 17, 17, 18, 20, 30, 40]       
main <<<          merge1: [1, 2, 3, 3, 5, 5, 6, 7, 8, 8, 9, 9, 10, 11, 11, 12, 15, 15, 17, 17, 18, 20, 30, 40]       
main <<<          merge2: [1, 2, 3, 3, 5, 5, 6, 7, 8, 8, 9, 9, 10, 11, 11, 12, 15, 15, 17, 17, 18, 20, 30, 40]   


5
1, 3, 5, 7
2, 5, 8, 10
3, 6, 8, 9
11, 15, 17, 20
10, 15, 20, 25
main <<<       n: 5
main <<< minSize: 4
main <<< maxSize: 4
main <<< difSize: false
main <<<    arrs:
[1, 3, 5, 7]
[2, 5, 8, 10]
[3, 6, 8, 9]
[11, 15, 17, 20]
[10, 15, 20, 25]
main <<< mergeSameLength: [1, 2, 3, 3, 5, 5, 6, 7, 8, 8, 9, 10, 10, 11, 15, 15, 17, 20, 20, 25]
main <<<        copySort: [1, 2, 3, 3, 5, 5, 6, 7, 8, 8, 9, 10, 10, 11, 15, 15, 17, 20, 20, 25]
main <<<          merge1: [1, 2, 3, 3, 5, 5, 6, 7, 8, 8, 9, 10, 10, 11, 15, 15, 17, 20, 20, 25]
main <<<          merge2: [1, 2, 3, 3, 5, 5, 6, 7, 8, 8, 9, 10, 10, 11, 15, 15, 17, 20, 20, 25]


3
1, 3
2, 4, 5
6
main <<<       n: 3
main <<< minSize: 1
main <<< maxSize: 3
main <<< difSize: true
main <<<    arrs:
[1, 3]
[2, 4, 5]
[6]
main <<<        copySort: [1, 2, 3, 4, 5, 6]
main <<<          merge1: [1, 2, 3, 4, 5, 6]
main <<<          merge2: [1, 2, 3, 4, 5, 6]


3
1, 4, 7, 9
2, 5, 8
1, 2, 3, 6, 9
main <<<       n: 3
main <<< minSize: 3
main <<< maxSize: 5
main <<< difSize: true
main <<<    arrs:
[1, 4, 7, 9]
[2, 5, 8]
[1, 2, 3, 6, 9]
main <<<        copySort: [1, 1, 2, 2, 3, 4, 5, 6, 7, 8, 9, 9]
main <<<          merge1: [1, 1, 2, 2, 3, 4, 5, 6, 7, 8, 9, 9]
main <<<          merge2: [1, 1, 2, 2, 3, 4, 5, 6, 7, 8, 9, 9]


0
main <<<       n: 0
main <<< minSize: 0
main <<< maxSize: 0
main <<< difSize: false
main <<<    arrs:


2
1, 3, 5
2, 4
main <<<       n: 2
main <<< minSize: 2
main <<< maxSize: 3
main <<< difSize: true
main <<<    arrs: 
[1, 3, 5]
[2, 4]
main <<<        copySort: [1, 2, 3, 4, 5]
main <<<          merge1: [1, 2, 3, 4, 5]
main <<<          merge2: [1, 2, 3, 4, 5]

Sorry about the number of examples. Like I said, we are trying to learn by experimenting. In addition we are not able to submit the code to an on-line service (i.e., HackerRank or LeetCode) and have our code be accepted and being able to determine how it performed against other submissions.

Our test code is presented with the number of arrays (K) followed by lines with the values of each array. The values of each array will be sorted. After reading each line we and populating an int[] we could sort the arrays to make them comply in case there is a typo. As we will see in the actual code, such check is not implemented. If you feel is needed then just add a call to the Arrays.sort() method after the int[] arrays are created and populated.

The first thing that our test code does is to display the number of arrays. As I am writing this post I noticed that the number of arrays should be referred as K and not n. Sorry about that.

We then compute and display the minimum and maximum sizes of the set of arrays. The reason is that most of the implementations support sorted arrays of different sizes as input, but one does not. The difSize variable indicates if the arrays are of different sizes (true) or not (false).

The test code displays the contents of the int[][] which we will pass to all our implementations.

It seems we have four different implementations. They all seem to generate the same solution. All solutions must contain all the values and the values must be in ascending order. This makes it easy to verify.

Note that when we present arrays of different sizes in a test case, the difSize variable is set to true and we only get three solutions. As we mentioned the first approach only works when all input arrays are of the same length.

   /**
     * Test scaffolding
     * 
     * !!!! NOT PART OF SOLUTION !!!!
     * 
     * @throws IOException
     */
    public static void main(String[] args) throws IOException {
        
        // **** initialization ****
        int minSize     = 0;
        int maxSize     = 0;
        int[] output    = null;
        boolean difSize = false;
        int[][] arrs    = null;

        // **** open buffered reader ****
        BufferedReader br = new BufferedReader(new InputStreamReader(System.in));

        // **** read number of arrays ****
        int n = Integer.parseInt(br.readLine().trim());

        // ???? ????
        System.out.println("main <<<       n: " + n);

        // **** loop reading sorted arrays 
        //      checking if they are of different sizes (if needed) ****
        if (n != 0) {

            // **** declare int[][] array ****
            arrs = new int[n][];

            // **** ****
            minSize = Integer.MAX_VALUE;
            maxSize = Integer.MIN_VALUE;

            // **** read and populate int[][] ****
            for (int i = 0; i < n; i++) {
                arrs[i] = Arrays.stream(br.readLine().trim().split(", "))
                                .mapToInt(Integer::parseInt)
                                .toArray();

                // **** update min size ****
                if (arrs[i].length < minSize)
                    minSize = arrs[i].length;

                // **** update max size ****
                if (arrs[i].length > maxSize)
                    maxSize = arrs[i].length;
            }
        }

        // **** close buffered reader ****
        br.close();

        // **** determine if arrays have different sizes ****
        difSize = (minSize != maxSize);

        // ???? ????
        System.out.println("main <<< minSize: " + minSize);
        System.out.println("main <<< maxSize: " + maxSize);
        System.out.println("main <<< difSize: " + difSize);
        System.out.println("main <<<    arrs: ");
        for (int i = 0; i < n; i++)
            System.out.println(Arrays.toString(arrs[i]));

        // **** arrays must be of the same length ****
        if (!difSize) {

            // **** ****
            output = mergeSameLength(arrs);

            // ???? ????
            if (output != null)
                System.out.println("main <<< mergeSameLength: " + Arrays.toString(output));
        }
    
        // **** simple approach ****
        output = copySort(arrs);

        // ???? ????
        if (output != null)
            System.out.println("main <<<        copySort: " + Arrays.toString(output));

        // **** arrays may be any length ****
        output = merge1(arrs);

        // ???? ????
        if (output != null)
            System.out.println("main <<<          merge1: " + Arrays.toString(output));

        // **** arrays may be any length ****
        output = merge2(arrs);

        // ???? ????
        if (output != null)
            System.out.println("main <<<          merge2: " + Arrays.toString(output));
    }

Our test scaffold performs the tasks of parsing and collecting the input data which is always an int[][] array names arrs.

Our test code then calls three or four methods depending on the sizes of the sorted arrays. The results are displayed if the input set of arrays is not null. I just added the condition of possibly having a null input to be in line with most problems found on-line.

    // **** sorted array size (used by merge & divide) ****
    static int N = 4;
	
	
    /**
     * Merge sorted arrays of the same length.
     * Recursive entry point.
     * 
     * Execution:  O(nk log(k))
     */
    static int[] mergeSameLength(int[][] arrs) {

        // **** sanity checks(s) ****
        if (arrs == null)
            return null;

        // **** initialization ****
        int k           = arrs.length;
        N               = arrs[0].length;
        int[] output    = new int[N * k];

        // **** divide and merge ****
        divide(0, k - 1, output, arrs);

        // **** return merged array ****
        return output;
    }

If you are familiar with the Merge Sort algorithm then the first approach should look somewhat familiar. It is an attempt to apply it to pick up from a point in the middle of the process and end up with the merged array.

We start by setting the N variable which holds the length of all the arrays. After the initialization we make a call to the divide method that continues to divide the arrays and then merge them into the solution.

    /**
     * Build recursion tree.
     */
    static void divide(int left, int right, int[] output, int[][] arrs) {

        // **** base case ****
        if (left == right) {

            // **** copy an input to the output array ****
            for (int i = 0; i < N; i++)
                output[left * N + i] = arrs[left][i];

            // **** nothing else to do ****
            return;
        }

        // **** sort left half ****
        divide(left, (left + right) / 2, output, arrs);

        // **** sort right half ****
        divide((left + right) / 2 + 1, right, output, arrs);

        // **** merge left and right half ****
        merge(left, right, output);
    }

The divide method is used to divide the left and the right part of an array and implement the merge sort. After the sorted parts are returned, we proceed to merge the resulting two sorted arrays into one.

    /**
     * Perform merge operation.
     */
    static void merge(int left, int right, int[] output) {

        // **** store starting point of left and right array ****
        int leftIn  = left * N;
        int rightIn = ((left + right) / 2 + 1) * N;

        // **** store the size of the left and right array ****
        int leftSize    = ((left + right) / 2 - left + 1) * N;
        int rightSize   = (right - (left + right) / 2) * N;

        // **** arrays to temporarily store left and right arrays ****
        int leftArr[]   = new int[leftSize];
        int rightArr[]  = new int[rightSize];
 
        // **** store data in left array ****
        for (int i = 0; i < leftSize; i++)
            leftArr[i] = output[leftIn + i];

        // **** store data in right array ****
        for (int i = 0; i < rightSize; i++)
            rightArr[i] = output[rightIn + i];

        // **** store the current index of temporary left and right array ****
        int leftCurr    = 0;
        int rightCurr   = 0;

        // **** store the current index for the output array
        int in  = leftIn;

        // **** two pointer merge for two sorted arrays ****
        while (leftCurr + rightCurr < leftSize + rightSize) {
            if (rightCurr == rightSize || (leftCurr != leftSize && leftArr[leftCurr] < rightArr[rightCurr])) {
                output[in] = leftArr[leftCurr];
                leftCurr++;
                in++;
            } else {
                output[in] = rightArr[rightCurr];
                rightCurr++;
                in++;
            }
        }
    }

This method (perhaps somewhat convoluted) is used to merge two arrays into one.

The steps before the while loop are part of the initialization. The while loop just picks the next elelemnt from the proper array in ascending order.

    /**
     * Copy all arrays and sort the new array.
     * 
     * Arrays.sort() is based on the TimSort algorithm, 
     * giving us a time complexity of O(n log(n)).
     * TimSort makes use of the Insertion sort and the MergeSort algorithms.
     * 
     * Runtime: O(n*k log(n*k))
     */
    static int[] copySort(int[][] arrs) {

        // **** sanity checks ****
        if (arrs == null)
            return null;

        // **** initialization O(n) ****
        int len = 0;
        for (int i = 0; i < arrs.length; i++)
            len += arrs[i].length;

        int[] output = new int[len];

        // **** copy arrays O(n * k) ****
        for (int i = 0, destPos = 0; i < arrs.length; i++) {
            System.arraycopy(arrs[i], 0, output, destPos, arrs[i].length);
            destPos += arrs[i].length;
        }

        // **** sort output array O(n*k log(n*k)) ****
        Arrays.sort(output);

        // **** return sorted array ****
        return output;
    }

This method implements what I would call a MVP (Minimum Viable Product) implementation. We determine the length of the final sorted array and create the int[] array.

We copy the contents of the input arrays into the output array.

We then sort the output array.

This method is straightforward and with some changes in the syntax, you should be able to implement it in most (never generalize) programming languages.

    /**
     * Merges a set of n sorted arrays of different lengths.
     * 
     * Runtime: O(nk * k)
     */
    static int[] merge1(int[][] arrs) {

        // **** sanity check(s) ****
        if (arrs == null)
            return null;

        // **** initialization ****
        int sortLen = 0;
        for (int i = 0; i < arrs.length; i++)
            sortLen += arrs[i].length;
        int[] output    = new int[sortLen];

        int[] ndex      = new int[arrs.length];

        // **** loop until we processed all entries O(nk * k) ****
        for (int i = 0; i < sortLen; i++) {

            // **** select array and index O(k) ****
            int[] ai = nextElement1(ndex, arrs);

            // **** insert into output array O(1) ****
            output[i] = arrs[ai[0]][ai[1]];
        }

        // **** return sorted array ****
        return output;
    }

In this method we will use a similar but extended approach as we did in the previous post.

Note that we declare an int[] array named ndex with a size that matches the number of sorted arrays.

We then enter a loop. The loop is used to populate every entry in the sorted array.

The first step is to make a call to an auxiliary method named nextElement1(). The arrays seems to return a two entries int[] array. The first value represents the input array and the second value the element index in the input array.

Once we get the ai array we can use it to set the current entry in output array.

    /**
     * Utility method to get the next row and index of
     * smallest element in int[][] arrs.
     * 
     * Runtime:  O(k)
     */
    static int[] nextElement1(int[] ndex, int[][] arrs) {

        // **** initialization ****
        int val     = Integer.MAX_VALUE;
        int[] ai    = new int[2];

        // **** traverse ndex looking for smallest value O(k) ****
        for (int i = 0; i < ndex.length; i++) {

            // **** skip this array (if needed) ****
            if (ndex[i] < 0)
                continue;

            // **** check if this value is smaller ****
            if (arrs[i][ndex[i]] < val) {
                val = arrs[i][ndex[i]];
                ai[0] = i;
                ai[1] = ndex[i];
            }
        }

        // **** update the ndex array ****
        ndex[ai[0]] = ai[1] + 1;
        if (ndex[ai[0]] >= arrs[ai[0]].length)
            ndex[ai[0]] = -1;

        // **** return the pair of values ****
        return ai;
    }

This is the auxiliary method we used in the merge1() method. The method takes two arguments. The first is the int[] array of indices. The second is the matrix int[][] representing the sorted input arrays.

After the initialization steps, we traverse the ndex int[] array to determine from which sorted array we will pick the next value to insert into the output int[] array.

If we reached the end of the current sorted array, we skip it; otherwise we determine if this is the minimum value we have encountered so far from the next element in the current sorted array. If a new minimum is sound, the ai int[] array is updated.

Just before exiting the method, we update the ndex int[] array. We do so because we have consumed an entry from one of the input sorted arrays. If we have reached the end of the array with the selected element, we flag it with a -1 to skip it in the next call to this method.

The ai int[] array is then returned.

    /**
     * Merges a set of n sorted arrays of different lengths.
     * 
     * Runtime: O(n*k log(n*k))
     */
    static int[] merge2(int[][] arrs) {

        // **** sanity check(s) ****
        if (arrs == null)
            return null;

        // **** initialization ****
        PriorityQueue<PQNode> pq = new PriorityQueue<PQNode>();

        int len = 0;
        for (int i = 0; i < arrs.length; i++) {

            // **** skip empty array ****
            if (arrs[i].length == 0)
                continue;

            // **** increment len ****
            len += arrs[i].length;

            // **** add node to pq ****
            pq.add(new PQNode(i, 0, arrs[i][0]));
        }

        int[] output = new int[len];

        // **** loop until we process all elements O(n*k log(n*k)) ****
        for (int i = 0; !pq.isEmpty(); i++) {

            // **** remove element from the pq ****
            PQNode node = pq.remove();

            // **** add value to the sorted array ****
            output[i] = node.val; 

            // **** add element to the pq (if possible) ****
            if (node.ei < arrs[node.ai].length - 1)
                pq.add(new PQNode(node.ai, node.ei + 1, arrs[node.ai][node.ei + 1]));
        }

        // **** return sorted array ****
        return output;
    }

This is the last of the implementations. It is like and extension / refinement of the previous method.

 

We start by initializing a priority queue. Instead of using an array ai[] with three elements, we will use the auxiliary class PQNode which we will review shortly.

 

We start by initializing the priority queue with the first elements of all the input sorted int[] arrays.

 

We then create the output int[] array of the proper size / length.

 

We then enter a loop populating one output entry per pass. We do so by removing the smallest element from the priority queue. We set the value in the output array. We then add the next element from the input array that we just used if all the elements have not been processed yet.

When done we return the sorted array.

Note that the main loop is very similar to what we did in the previous method but for simplicity we used a priority queue and replaced the ai int[] array with the PQNode.

/**
 * Private Queue Node
 */
class PQNode implements Comparable<PQNode> {

    // **** members ****
    int ai;                 // array index
    int ei;                 // element index in array
    int val;                // element value

    /**
     * Constructor
     */
    public PQNode(int ai, int ei, int val) {
        this.val    = val;
        this.ai     = ai;
        this.ei     = ei;
    }

    /**
     * compareTo() method
     */
    @Override
    public int compareTo(PQNode node) {
        return this.val - node.val;
    }

    /**
     * toString() method
     */
    @Override
    public String toString() {
        return "" + val + " [" + ai + "," + ei + "]";
    }
}

We start by defining the members of the PQNode class. The first two members represent the contents of the ai int[] in the previous method. The val represents the value of the actual element.

Not much to add to the constructor.

Note that because we stated that the PQNode implements Comparable<> we need to add the compareTo() method. This method is used to order the elements in the priority queue.

If you are not too familiar with the Comparable<> on an integer we need to return a negative integer, zero, or a positive integer as this object is less than, equal to, or greater than the specified object. We could have used a set of if-then-else statements but a subtraction returns the same results. If needed please take a few moments to implement the code with a set of if-then-else statements with a couple values. I will wait …

…OK, I believe we are done.

Hope you enjoyed solving this problem as much as I did. The entire code for this project can be found in my GitHub repository.

If you have comments or questions regarding this, or any other post in this blog, or if you would like for me to help out with any phase in the SDLC (Software Development Life Cycle) of a project associated with a product or service, please do not hesitate and leave me a note below. If you prefer, send me a private e-mail message. I will reply as soon as possible.

Keep on reading and experimenting. It is one of the best ways to learn, become proficient, refresh your knowledge and enhance your developer toolset.

One last thing, many thanks to all subscribers to this blog!!!

Keep safe during the COVID-19 pandemic and help restart the world economy. I believe we can all see the light at the end of the tunnel.

Regards;

John

john.canessa@gmail.com

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.