Binary Tree – Heap

I read an article or two from Medium every day. A few days ago I read “Binary Trees: The Heap” by David Pynes. The idea behind a binary tree or heap is to be able to associate values with associated priorities. For example, assume you are in line at an emergency room in a hospital. When you arrive and register the facility may use a plain queue (FIFO) to wait for a physician. What happens if a patient in worse condition that you arrives later. The logical thing would be to allow them to see a physician before patients that are less ill.

I can give you a different example in which I implemented a priority queue in a similar fashion. Each node in the queue holds a value for priority. When the node is inserted the software checks the priority. The software then traverses the queue from the head to the tail or vice versa depending on the priority of the element being inserted. It then inserts the new node at the proper place. The queue is implemented using double links (e.g., next and previous). Another implementation was based on a hash map with buckets represented by queues. The element would be inserted in the queue with the proper priority. I can continue describing different implementations which server the same purpose but with different requirements and performance. When possible I spend some time thinking on possible approaches, and then select the one that looks best. After the selected approach is implemented and tested, I try to consider others in order to check if an alternate approach would perform better. No matter what one thinks, it is always better for the product or service to spend time in design, implementation and testing than releasing something and then have it come back for rework $$$.

So what we need to implement is a binary heap using an array. The array is quite efficient because the way it is used. In a general binary tree one can expect insertions and deletions on a node at any position. In most cases one also needs to rebalance the tree. This typically happens after each insertion and deletion. Failure to do so may produce unbalanced trees with reduced performance.

As usual I tend to use the Test Driven Development (TDD). You would be surprised how the quality of the software dramatically increases with this approach. The reason is that you end spending additional time thinking how the software should perform and in most cases should behave. This gives an opportunity to improve on the first implementation. If you want to read more about TDD please read here.

The code in this post is written in Java using the Eclipse IDE. The main program follows:

	/*
	 * 
	 */
	public static void main(String[] args) {

		final int 	HEAP_SIZE 	= 7;
		final int	ELEMENTS	= HEAP_SIZE + 1;
		final long	RANDOM_SEED	= 1;

		// **** seed the pseudo random number generator ****
		Random rand = new Random(RANDOM_SEED);
		
		// **** create a heap ****
		Heap heap = new Heap(HEAP_SIZE);
		
		// **** remove from an empty heap ****
		System.out.println("main <<< root: " + heap.remove());
		System.out.println();
		
		// **** populate the array ****
		int[] arr = new int[ELEMENTS];
		for (int i = 0; i < arr.length; i++)
			arr[i] = i + 1;
		
		// **** display array ****
		System.out.println("main <<< arr: " + arrayToString(arr));
		
		// **** shuffle the array ****
		shuffle(rand, arr);
		
		// **** display array ****
		System.out.println("main <<< arr: " + arrayToString(arr));
		System.out.println();
		
		// **** loop inserting the pseudo random values ****
		for (int i = 0; i < ELEMENTS; i++) {
			System.out.println("main <<< arr[" + i + "]: " + arr[i]);
			heap.insert(arr[i]);
		}
		System.out.println();
		
		// **** display the heap ****
		System.out.println("main <<< heap: " + heap.toString());
		System.out.println();
		
		// **** loop removing the root from the heap ****
		for (int i = 0; i < ELEMENTS; i++) {
			System.out.println("main <<< root: " + heap.remove());
		}
		System.out.println();
		
		// **** display the heap ****
		System.out.println("main <<< heap: " + heap.toString());
	}

I like to develop in steps and test as I go. For this reason I use some constants to help me proceed in an incremental fashion. The HEAP_SIZE started with 1 and then went up and down during development. The ELEMENTS constant is used to set the number of nodes in the heap / tree. Finally, the RANDOM_SEED is used to make sure that I am able to repeat the cases if I ran into an issue or wish to address in a different way some part of the class under development. When all is set and done, I tend to remove seeding the pseudo random number generator and let the code run free of that constraint.

Nest we create a heap. In this implementation the heap has a fixed size. We could easily remove this constraint by extending the heap when we fill it up. We could also implement a mechanism to free part of the heap if the number of nodes decreases. These approaches are typically needed in production code and do take time and effort to implement and test. Time permitting I might enhance this software to support such features.

We know that the heap is empty, so what better time that attempting to remove a node from an empty heap.

The next step will be to generate a set of monotonically ascending integers and place them into an array. I did this for my convenience. I started by generating random integers in a range but had some duplicates. Tested with them and then was about to implement a method to print the array as a tree. For that I preferred not to have duplicates and have them is a know range.

Given that we would like to start with pseudo random numbers, decided to shuffle the array.

Now it is time to insert all the values into the heap. Note that we will attempt to insert one more value than the total capacity. We will be able to check that our class is properly handling such condition.

Next we will remove the values form the heap. The values must come out in monotonically descending order if all is working well. To complete the test we will display the heap one last time. It should be empty.

Following is the output from the console:

remove <<< the heap is EMPTY!!!
main <<< root: -1

main <<< arr: [ 1 2 3 4 5 6 7 8 ]
main <<< arr: [ 3 7 8 1 4 2 5 6 ]

main <<< arr[0]: 3
main <<< arr[1]: 7
main <<< arr[2]: 8
main <<< arr[3]: 1
main <<< arr[4]: 4
main <<< arr[5]: 2
main <<< arr[6]: 5
main <<< arr[7]: 6
insert <<< the heap is FULL!!!

main <<< heap: size: 7 tree: 8 4 7 1 3 2 5 

main <<< root: 8
main <<< root: 7
main <<< root: 5
main <<< root: 4
main <<< root: 3
main <<< root: 2
main <<< root: 1
remove <<< the heap is EMPTY!!!
main <<< root: -1

main <<< heap: size: 0 tree: 

Seems all is working as expected.

Let’s take a look at the methods we implemented to shuffle and display the contents of the array.

remove <<< the heap is EMPTY!!! /* * Algorithm P (Shuffling) page: 145-146 * Donald E. Knuth * The Art of Computer Programming * VOlume 2 * Seminumerical Algorithms * Third Edition */ static void shuffle(Random rand, int[] arr) { for (int i = arr.length - 1; i > 0; i--) {
	        int j 		= rand.nextInt(i + 1);
	        int temp	= arr[i];
	        arr[i] 		= arr[j];
	        arr[j] 		= temp;
	    }
	}
	
	/*
	 * array to string
	 */
	static String arrayToString(int[] arr) {
		String str;
		
		// **** ****
		str = "[ ";
		for (int i = 0; i < arr.length; i++)
			str += arr[i] + " ";
		str += "]";
		
		// **** ****
		return str;
	}

Now let’s look at the implementation of the Heap class. There are only two main methods. One for inserting new nodes which is always done at the end of the tree, and the other is to remove nodes, which is always done at the root (highest priority) of the tree. After inserting a node, we need to bubble up (or in our case upHeap) the node until it finds a location in which the value of the parent is larger than the value of the node, or the root has been reached (the new node becomes the new root).

When we remove a node we always remove the root. After removing the root we replace it with the last node in the tree. Once that is done we need to bubble down (in our case downHeap) the node until it finds its new location. This is achieved comparing the value of the node with the value of its children. We swap the node with the larger node level. This has the effect to move up the highest node to the root. The process repeats until we reach a leaf node.

The code for the Heap class follows:

package heap.canessa;


public class Heap {

	// **** members ****
	private int tree[];
	private int maxSize;
	private int size;
	
	// **** constructor ****
	public Heap(int size) {
		this.maxSize	= size;
		this.tree		= new int[size];
		this.size 		= 0;
	}
	
	// **** insert the element into the heap ****
	public void insert(int val) {
		
		// **** check if the heap is full ****
		if (this.size == maxSize) {
			System.out.println("insert <<< the heap is FULL!!!");
			return;
		}
		
		// **** insert the value at the end of the heap ****
		tree[this.size] = val;

		// **** up heap ****
		upHeap(this.size++);
	}
	
	// **** up heap the specified node (recursive method) ****
	private void upHeap(int child) {
		
//		System.out.println("upHeap <<< i: " + child + " tree[" + child + "]: " + tree[child] + " tree[" + parent(child) + "]: " + tree[parent(child)]); // **** for ease of use **** final int root = 0; // **** base condition (reached the root node) **** if (child == root) return; // **** check if child node value is greater than it's parent **** if (tree[child] > tree[parent(child)]) {
//			System.out.println("upHeap <<< before swap: " + toString());
			swap(child, parent(child));
//			System.out.println("upHeap <<<  after swap: " + toString());
			upHeap(parent(child));
		}
	}
	
	// **** remove the root from the tree ****
	public int remove() {
		
		// **** for ease of use ****
		final int root = 0;
		
		// **** check if the heap is empty ****
		if (this.size == 0) {
			System.out.println("remove <<< the heap is EMPTY!!!"); return -1; } // **** return the root value **** int rootVal = tree[root]; // **** swap the last node with the root node **** swap(root, --this.size); // **** down heap the root node **** downHeap(root); // **** return the top of the heap **** return rootVal; } // **** down heap the specified node (recursive call) **** private void downHeap(int parent) { // **** base condition **** if (rightChild(parent) >= this.size)
			return;
		
		// **** determine which direction to down heap ****
		int temp = parent;
		if (tree[temp] < tree[leftChild(parent)])
			temp = leftChild(parent);
		if (tree[temp] < tree[rightChild(parent)])
			temp = rightChild(parent);
//		System.out.println("downHeap <<< parent: " + parent + " temp: " + temp);
		
		// **** ****
		if (temp != parent) {
//			System.out.println("downHeap <<< before swap: " + toString());
			swap(parent, temp);
//			System.out.println("downHeap <<<  after swap: " + toString());
			downHeap(temp);	
		}
	}
	
	// **** swap - convenience method ****
	void swap(int i, int j) {
		int temp 	= this.tree[i];
		this.tree[i] = this.tree[j];
		this.tree[j] = temp;
	}
	
	// **** parent - convenience method ****
	private int parent(int child) {
		return (child - 1) / 2;
	}
	
	// **** left child - convenience method ****
	private int leftChild(int parent) {
		return parent * 2 + 1;
	}
	
	// **** right child - convenience method ****
	private int rightChild(int parent) {
		return parent * 2 + 2;
	}
	
	// **** to string - convenience method ****
	public String toString() {
		String str = "";
		
		str = "size: " + this.size + " tree: ";
		for (int i = 0; i < this.size; i++) 
			str += this.tree[i] + " ";
		
		// **** ****
		return str;
	}
}

The insert() method checks if the heap is full. At that time we could possibly grow the heap but as previously mentioned, we will just return an error. We could also throw an exception. Growing the heap is a feature that could be developed after the current software is up and running. Many libraries of different classes (e.g., dynamic arrays also known as ArrayList) exhibit such property.

After the element is inserted as the last node in the tree, the upHeap() method is called to bubble up the new value to the proper location. Node that this and the downHeap() methods are recursive. This is a good situation to perform recursion because the limits are well controlled.

The remove() method is similar to the insert() one. We first check if the heap is empty. We remove the root node and in its pace we put the last node in the heap. In most situations we would need to bubble down the node. For that we use the downHeap() method which is also implemented in a recursive mode. The root value is returned.

That is it. We could implement the priority heap these four methods and a constructor. In this case we will use some convenience methods which make the code easier to develop and maintain. The swap() method swaps the values of two nodes. The parent(), rightChild() and leftChild() methods are simple and based on the fact that the root is at location [0], and for a node at location i, the left child is always at i * 2 + 1 and the right child is at i * 2 + 2. Cool.

The toString() method is used to verify the operations during development. As you can see I use several print statement most of them are commented out in the final version. In production I prefer to use a runtime flag to enable or disable messages. I also prefer to send them to a log file. I will try to cover such mechanism in a future post.

If you are curious about the priority heap, it is not a new invention / creation. It was developed in 1964. If you are interested you may read more here.

If you have comments or questions regarding this or any other post, or if you would like to see a specific subject, please leave me a note bellow.

Keep on learning and developing better code;

John

Please follow me on Twitter:  @john_canessa

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.