LeetCode 295. Find Median from Data Stream in Java

In this post we will try to solve the LeetCode 295. Find Median from Data Stream problem using the Java programming language and the VSCode IDE on a Windows computer.

The median is the middle value in an ordered integer list. 
If the size of the list is even, 
there is no middle value and the median is the mean of the two middle values.

For example, for arr = [2,3,4], the median is 3.
For example, for arr = [2,3], the median is (2 + 3) / 2 = 2.5.

Implement the MedianFinder class:

o MedianFinder()
	initializes the MedianFinder object.
o void addNum(int num)
	adds the integer num from the data stream to the data structure.
o double findMedian()
	returns the median of all elements so far. 
	Answers within 10-5 of the actual answer will be accepted.
	
Constraints:

o -10^5 <= num <= 10^5
o There will be at least one element in the data structure before calling findMedian.
o At most 5 * 10^4 calls will be made to addNum and findMedian.

Related Topics:

o Two Pointers
o Design
o Sorting
* Heap (Priority Queue)
o Data Stream

In a nutshell the class needs to process integers in the specified range and at different points we can be asked to return the current Median. As a brute force approach, I tried sorting. I did not submit such an approach because it is quite expensive (slow) and the problem is rated Hard by LeetCode.

In the next approach I tried to use a priority queue and streams. We will take a look at the first implementation which times out so it was not accepted by LeetCode. The reason for including it in the post is to understand what needs to be done and the need to optimize it.

I will be solving the problem on my computer. My suggestion, unless you have some specific needs, is to solve the problem using the online IDE provided by LeetCode. Since I will solve the problem on my computer, I needed to develop a simple test scaffold to read the input data, populate the necessary data structures and variables, call the different methods in the class of interest, collect the results and display them. Please note that the test scaffold IS NOT PART OF THE SOLUTION!

class MedianFinder {

    public MedianFinder() {
        
    }
    
    public void addNum(int num) {
        
    }
    
    public double findMedian() {
        
    }
}

/**
 * Your MedianFinder object will be instantiated and called as such:
 * MedianFinder obj = new MedianFinder();
 * obj.addNum(num);
 * double param_2 = obj.findMedian();
 */

We need to populate the MedianFinder class with the class elements and the three required methods. The first one is the constructor and initializes the data structures we will use. The second method adds an integer to the class. The `findMedian` method is used to compute and return the median.

Note that LeetCode describes in the comments section of the class signature how our solution will be tested.

MedianFinder, addNum, addNum, findMedian, addNum, findMedian
null,         1,      2,      null,       3,      null
main <<<      n: 6
main <<< cmdArr: [MedianFinder, addNum, addNum, findMedian, addNum, findMedian]
main <<< argArr: [null, 1, 2, null, 3, null]
main <<<           m : 6
main <<<   pq2 (head): [3, 1, 2]
main <<<   pq1 (head): [4, 5, 6]
main <<< obj.pq2.peek: 3
main <<< obj.pq1.peek: 4
main <<<       median: 3.5
main <<<           m : 7
main <<<   pq2 (head): [3, 1, 2]
main <<<   pq1 (head): [4, 5, 6, 7]
main <<< obj.pq2.peek: 3
main <<< obj.pq1.peek: 4
main <<<       median: 4
main <<< output: [null, null, null, 1.5, null, 2.0]


MedianFinder, addNum, addNum, addNum, addNum, addNum, addNum, findMedian, addNum, findMedian
null,         1,      2,      3,      4,      5,      6,      null,       7,      null
main <<<      n: 10
main <<< cmdArr: [MedianFinder, addNum, addNum, addNum, addNum, addNum, addNum, findMedian, addNum, findMedian]       
main <<< argArr: [null, 1, 2, 3, 4, 5, 6, null, 7, null]
main <<<           m : 6
main <<<   pq2 (head): [3, 1, 2]
main <<<   pq1 (head): [4, 5, 6]
main <<< obj.pq2.peek: 3
main <<< obj.pq1.peek: 4
main <<<       median: 3.5
main <<<           m : 7
main <<<   pq2 (head): [3, 1, 2]
main <<<   pq1 (head): [4, 5, 6, 7]
main <<< obj.pq2.peek: 3
main <<< obj.pq1.peek: 4
main <<<       median: 4
main <<< output: [null, null, null, null, null, null, null, 3.5, null, 4.0]

Each test case is separated by two blank lines.

The first two lines in each test case represent the input for the test case.

The first input line contains the names of the methods that will be called by our test scaffold. The second line contains the associated argument that will be passed to the method in questions. A `null` indicates that there is no input argument for the specified method.

It seems that our test scaffold which IS NOT PART OF THE SOLUTION, reads the two input lines, populates some variables, and displays them to allow us to verify that all is well so far.

The next few lines before calling the methods of interest will make sense as we take a look at the test scaffold.

The test scaffold then calls the different methods passing the associated arguments as needed. After each method completes, the returned value is placed in the `output` Integer[] array. A `null` indicates the associated method did not return a value.

I spent time in the first implementation of the solution, which we should label as a brute force approach. The implementation passed the test case used as an example, and as expected timed out, so it was not accepted by LeetCode. We will take a brief look at the first implementation because it provided me with additional insights as to what might work.

Our test scaffold will not call the first implementation of the `MedianFinder` class.

    /**
     * Test scaffold.
     * !!! NOT PART OF THE SOLUTION !!!
     * @throws IOException
     */
    public static void main(String[] args) throws IOException {

        // **** open buffered reader ****
        BufferedReader br = new BufferedReader(new InputStreamReader(System.in));

        // **** read commands ****
        String[] cmdArr = Arrays.stream(br.readLine().trim().split(","))
                                .map(s -> s.trim())
                                .toArray(size -> new String[size]);

        // **** read arguments ****
        Integer[] argArr = Arrays.stream(br.readLine().trim().split(","))
                                    .map(s -> s.trim())
                                    .map(s -> s.equals("null") ? null : Integer.parseInt(s) )
                                    .toArray(size -> new Integer[size]);

        // **** close buffered reader ****
        br.close();

        // **** get number of commands ****
        var n = cmdArr.length;

        // ???? ????
        System.out.println("main <<<      n: " + n);
        System.out.println("main <<< cmdArr: " + Arrays.toString(cmdArr));
        System.out.println("main <<< argArr: " + Arrays.toString(argArr));

        // **** holds output from method calls ****
        Double[] output = new Double[n];

        // **** median finder object ****
        // MedianFinder0 obj = null;
        MedianFinder obj = null;


        // ???? create object ????
        obj = new MedianFinder();

        // ???? populate priority queues ????
        obj.addToPqs(6);

        // ???? display the number of elements in the priority queues ????
        System.out.println("main <<<           m : " + (obj.pq2.size() + obj.pq1.size()));

        // ???? display priority queues contents ????
        System.out.println("main <<<   pq2 (head): " + obj.pq2.toString());
        System.out.println("main <<<   pq1 (head): " + obj.pq1.toString());

        // ???? display head nodes ????
        System.out.println("main <<< obj.pq2.peek: " + obj.pq2.peek());
        System.out.println("main <<< obj.pq1.peek: " + obj.pq1.peek());

        // ???? compute median ????
        System.out.println("main <<<       median: " + (obj.pq2.peek() + obj.pq1.peek()) / 2.0);

        // ???? populate priority queues ????
        obj.addToPqs(7);

        // ???? display the number of elements in the priority queues ????
        System.out.println("main <<<           m : " + (obj.pq2.size() + obj.pq1.size()));

        // ???? display priority queues contents ????
        System.out.println("main <<<   pq2 (head): " + obj.pq2.toString());
        System.out.println("main <<<   pq1 (head): " + obj.pq1.toString());

        // ???? display head nodes ????
        System.out.println("main <<< obj.pq2.peek: " + obj.pq2.peek());
        System.out.println("main <<< obj.pq1.peek: " + obj.pq1.peek());

        // ???? compute median ????
        System.out.println("main <<<       median: " + (obj.pq2.size() >= obj.pq1.size() ? obj.pq2.peek() : obj.pq1.peek()));


        // **** loop calling methods ****
        for (var i = 0; i < n; i++) {
            switch (cmdArr[i]) {
                case "MedianFinder":
                    // obj = new MedianFinder0();
                    obj = new MedianFinder();
                break;

                case "addNum":
                    obj.addNum(argArr[i]);
                break;

                case "findMedian":
                    output[i] = obj.findMedian();
                break;

                default:
                    System.err.println("main <<< UNEXPECTED cmdArr[" + i + "] ===>" + cmdArr[i] + "<==");
                break;
            }
        }

        // **** display output ****
        System.out.println("main <<< output: " + Arrays.toString(output));
    }

Our test code populates two arrays with the names of the methods to be called and the second with the Integer[] arguments to be passed to each method if needed. The contents of the arrays are displayed.

We then declare the `output` Double[] array which will hold the results of each method we call. A `null` is associated with methods that do not return a value.

Next we declare the `obj` object which will hold the object associated with the `MedianFinder` class after we call the constructor.

The following set of lines with `// ???? ????` comments were generated after the first implementation of the class of interest. We will return to them and the rest of the code after we take a look at the failed implementation.

    /**
     * 
     */
    static class MedianFinder0 {

        /**
         * Class members. 
         */
        public int n;
        public PriorityQueue<Integer> pq = null;
        // public List<Integer> pq = null;

	// **** more to follow ...
	}

The `MedianFinder0` class uses a priority queue `pq` and an integer `n` to hold the number of elements.  You can also see a commented out `pq` List<Integer> which I experimented with but did not pay off.

        /**
         * Initializes the MedianFinder object.
         */
        public MedianFinder0() {
            // pq = new PriorityQueue&lt;Integer&gt;((a, b) -&gt; {
            //     if (a &lt; b) return -1;
            //     else if (a == b) return 0;
            //     else return 1;
            // });

            // pq = new ArrayList&lt;&gt;();

            pq = new PriorityQueue&lt;&gt;();
        }

As you can see in the constructor, I tried ascending and descending implementations of the `pq`.

        /**
         * Adds the integer num from the data stream to the data structure.
         * @param num
         */
        public void addNum(int num) {
            pq.add(num);
            n++;
        }

Adding a number to the list is done by just adding the `num` to the priority queue. The `n` is actually not needed because we may use the `pq.size()` method to get the number of elements in the priority queue.

        /**
         * Returns the median of all elements so far. 
         * @return
         */
        public double findMedian() {
            
            // **** size of pq ****
            // var n = pq.size();

            // **** sanity checks ****
            if (n == 1) return pq.peek();
            // if (n == 1) return pq.get(0);

            // **** ****
            Integer[] mid = pq.stream()
                                // .sorted()
                                .skip(n / 2 - 1)
                                .limit(2)
                                .toArray(size -&gt; new Integer[size]);

            // **** return element at n / 2 ****
            if (n % 2 == 1)
                return mid[1];

            // **** sum elements at n / 2 - 1 and n / 2 and divide them by 2 ****
            else return (mid[0] + mid[1]) / 2.0;
        }

The `findMedian` method returns the median of the numbers in the priority queue. After several attempts, I decided to call it quits after using the stream approach depicted in this implementation.

Note that the execution time is O(n / 2) which is quite slow depending on the size of the priority queue.

The way we generate the median is correct. We just need to find a better (more efficient method) to get to the midpoint of the data structure(s) holding the integer elements we have been provided so far.

My notes ****

o PriorityQueue pq2
o PriorityQueue pq1
* All elements in the pq2 &lt;= all elements in the pq1
* PriorityQueue sizes `equal` OR `off by one`

    pq2    pq1
  1,2,3    4,5,6
      =    =
median: (3 + 4) / 2 = 3.5


Performance ****

o          Add to PriorityQueue: O(log(n))
o     Remove from PriorityQueue: O(log(n))
o Find element in PriorityQueue: O(1)


Median computation ****

o If pq2.size() == pq1.size()
	median: (pq2.peek() + pq1.peek()) / 2
	
o If pq2.size() &gt; pq1.size()
	median: pq2.peek()
	
o If pq2.size() &lt; pq1.size()
	median: pq1.peek()

After failed attempts and additional reading in which I expressed and collected notes I moved on to the next implementation of the class of interest.

We will use two priority queues.

The `pq2` will hold all the integers that are <= than the ones in `pq1`. The reason for this is to keep the elements whose values are less than the median in one priority queue, and the elements greater than the median on a second one.

The other condition is to keep the sizes of both priority queues about the same. Remember that the median in a set of odd numbers of values is found in the midpoint of a list and the median in an even set is found using the two elements in the midpoint of the list.

Now we should be able to add elements in O(log(n)) as we did in the previous implementation, but also compute the median in order O(1) which is much more efficient than the O(n) we had in the previous class implementation.

// ???? create object ????
obj = new MedianFinder();

// ???? populate priority queues ????
obj.addToPqs(6);

// ???? display the number of elements in the priority queues ????
System.out.println("main &lt;&lt;&lt;           m : " + (obj.pq2.size() + obj.pq1.size()));

// ???? display priority queues contents ????
System.out.println("main &lt;&lt;&lt;   pq2 (head): " + obj.pq2.toString());
System.out.println("main &lt;&lt;&lt;   pq1 (head): " + obj.pq1.toString());

// ???? display head nodes ????
System.out.println("main &lt;&lt;&lt; obj.pq2.peek: " + obj.pq2.peek());
System.out.println("main &lt;&lt;&lt; obj.pq1.peek: " + obj.pq1.peek());

// ???? compute median ????
System.out.println("main &lt;&lt;&lt;       median: " + (obj.pq2.peek() + obj.pq1.peek()) / 2.0);

// ???? populate priority queues ????
obj.addToPqs(7);

// ???? display the number of elements in the priority queues ????
System.out.println("main &lt;&lt;&lt;           m : " + (obj.pq2.size() + obj.pq1.size()));

// ???? display priority queues contents ????
System.out.println("main &lt;&lt;&lt;   pq2 (head): " + obj.pq2.toString());
System.out.println("main &lt;&lt;&lt;   pq1 (head): " + obj.pq1.toString());

// ???? display head nodes ????
System.out.println("main &lt;&lt;&lt; obj.pq2.peek: " + obj.pq2.peek());
System.out.println("main &lt;&lt;&lt; obj.pq1.peek: " + obj.pq1.peek());

// ???? compute median ????
System.out.println("main &lt;&lt;&lt;       median: " + (obj.pq2.size() &gt;= obj.pq1.size() ? obj.pq2.peek() : obj.pq1.peek()));

This is the segment of code from the test scaffold that we were discussing earlier.

We create a MedianFinder object and point to it in the `obj` variable.

We make a call to the `addToPqs` method which fills two priority queues. This is being done to show how two priority queues could be used to help us get to the O(1) execution in computing the median.

In this case we just populate an even and then odd number of elements in the priority queues.

Please go back to the screen capture of the execution of the test cases to follow what is happening when we fill the priority queues.

/**
 * Runtime: 144 ms, faster than 57.14% of Java online submissions.
 * Memory Usage: 125.1 MB, less than 7.60% of Java online submissions.
 * 
 * 21 / 21 test cases passed.
 * Status: Accepted
 * Runtime: 144 ms
 * Memory Usage: 125.1 MB
 */
static class MedianFinder {

	/**
	 * Class members. 
	 */
	public PriorityQueue&lt;Integer&gt; pq1   = null;
	public PriorityQueue&lt;Integer&gt; pq2   = null;

	// **** more to follow ...
}

We have two priority queues as the class members in `MedianFinder`.

/**
 * Initialize the MedianFinder object.
 */
public MedianFinder() {

	// **** descending ****
	pq2  = new PriorityQueue&lt;&gt;((a, b) -&gt; {
		if (a &gt; b) return -1;
		if (a == b) return 0;
		return 1;
	});

	// **** ascending order ****
	pq1  = new PriorityQueue&lt;&gt;();
}

Note that `pq2` keeps its elements in descending order, while `pq1` keeps them in ascending order. This will facilitate accessing the `head` elements from both queues in the needed \ required order. In a priority queue we can access the `head` element in O(1) time.

/**
 * Add the integer num from the data stream to the data structure.
 * 
 * Execution: O(log(n)) - Space: O(1) 
 */
public void addNum(int num) {

	// **** add num to pq2 (by default) ****
	pq2.add(num);

	// **** pq2 size &gt;= pq1 size ****
	if (pq2.size() &gt;= pq1.size()) {

		// **** sizes different by no more than one OR
		//      pq2 &gt; pq1 head ****
		if ((pq2.size() &gt; pq1.size() + 1) || 
			(!pq2.isEmpty() &amp;&amp; !pq1.isEmpty() &amp;&amp; pq2.peek() &gt; pq1.peek())) {

			// **** remove from pq2 ****
			var e = pq2.poll();

			// **** add to pq1 ****
			if (e != null) pq1.add(e);

			// **** move from pq1 head to pq2 ****
			if (pq2.size() + 1 &lt; pq1.size()) {

				// **** remove from pq1 ****
				e = pq1.poll();

				// **** add to pq2 ****
				if (e != null) pq2.add(e);
			}
		}
	}

	// **** add to low heap ****
	else
		pq1.add(num);
}

We start by adding the `num` into `pq2`.

We then check the sizes of the priority queues. We need to keep the lengths the same or no more than one off. If `pq2` is longer, we move the head element from `pq2` to the `pq1` priority queue.

Once we move the element we need to check that the largest element in `pq1` (happens to be at the head of the queue) be moved to `pq2`.

By the way, this code could be modified to eliminate the use of the `e` variable, and eliminate the check before inserting `e` into `pq1`.

If the size of `pq1` is smaller than the size of `pq2` we would insert `num` directly into `pq1`.

/**
 * Returns the median of all elements so far.
 * 
 * Execution: O(1) - Space: O(1)
 */
public double findMedian() {
	
	// **** median in pq2 ****
	if (pq2.size() &gt; pq1.size())
		return pq2.peek();
	
	// **** median in pq1 ****
	else if (pq2.size() &lt; pq1.size())
		return pq1.peek();

	// **** compute median ****
	else
		return (pq2.peek() + pq1.peek()) / 2.0;
}

This is the implementation of the `findMedian` method.

There are three cases each of which can be implemented in O(1) time.

/**
 * Experiment populating the priority queues.
 * 
 * !!! NOT PART OF THE SOLUTION !!!
 */
public void addToPqs (int n) {

	// **** initialization ****
	pq1.clear();
	pq2.clear();
	var i = 0;

	// **** populate priority queue 2 ****
	for ( ; i &lt; n / 2; i++)
		pq2.add(i + 1);

	// **** populate priority queue 1 ****
	for ( ; i &lt; n; i++)
		pq1.add(i + 1);
}

The `addToPqs` method is only used to experiment with the priority queues. This method IS NOT PART OF THE SOLUTION.

Hope you enjoyed solving this problem as much as I did. The entire code for this project can be found in my GitHub repository named FindMedianFromDataStream.

Please note that the code here presented might not be the best possible solution. After solving a problem, you should take a look at the best accepted solutions provided by the different websites (i.e., HackerRank, LeetCode to name a few). The idea is to learn concepts, not to memorize solutions.

If you have comments or questions regarding this, or any other post in this blog, please do not hesitate and leave me a note below. I will reply as soon as possible.

Keep on reading and experimenting. It is one of the best ways to learn, become proficient, refresh your knowledge and enhance your developer / engineering toolset.

Thanks for reading, feel free to connect with me John Canessa at LinkedIn.

Enjoy;

John

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.