Yesterday I was looking at a problem on the HackerRank web site. The title is “Self Balancing Tree”. The challenge is to write the **insert()** function / method in such a way to insert new elements and keep the binary search tree balanced. As usual, no matter how familiar the subject might be, I always **research** the subject before planning a solution. By doing so I refresh my knowledge and in many cases learn one or more things. To research I try to use **Google research** and go for Wikipedia articles. Based on what I find I tend to go into different on-line articles or books.

A **binary search trees (BST)**, sometimes called ordered or sorted binary trees, are a particular type of containers: data structures that store “items” (such as numbers, names etc.) in memory. They allow fast lookup, addition and removal of items, and can be used to implement either dynamic sets of items or lookup tables that allow finding an item by its key (e.g., finding the phone number of a person by name).

Binary search trees keep their keys in sorted order, so that lookup and other operations can use the principle of binary search: when looking for a key in a tree (or a place to insert a new key), they traverse the tree from root to leaf, making comparisons to keys stored in the nodes of the tree and deciding, based on the comparison, to continue searching in the left or right sub trees. On average, this means that each comparison allows the operations to skip about half of the tree, so that each lookup, insertion or deletion takes time proportional to the logarithm of the number of items stored in the tree. This is much better than the linear time required searching for items by key in an (unsorted) array, but slower than the corresponding operations on hash tables.

A **self-balancing (or height-balanced) binary search tree** is any node-based binary search tree that automatically keeps its height (maximal number of levels below the root) small in the face of arbitrary item insertions and deletions.

An AVL tree is a **self-balancing binary search tree**. It was the first such data structure to be invented. In an AVL tree, the heights of the two child sub trees of any node differ by at most one; if at any time they differ by more than one, rebalancing is done to restore this property. Lookup, insertion, and deletion all take O(log n) time in both the average and worst cases, where n is the number of nodes in the tree prior to the operation. Insertions and deletions may require the tree to be rebalanced by one or more tree rotations.

The AVL tree is named after its two Soviet inventors, Georgy Adelson-Velsky and Evgenii Landis, who published it in their 1962 paper “An algorithm for the organization of information”.

AVL trees are often compared with red–black trees because both support the same set of operations and take O(log n) time for the basic operations. For lookup-intensive applications, AVL trees are faster than red–black trees because they are more rigidly balanced. Similar to red–black trees, AVL trees are height-balanced. Both are in general not weight-balanced or μ-balanced for any μ ≤1 ⁄ 2; that is, sibling nodes can have hugely differing numbers of descendants. I will take a look at read-black binary search trees in a future post.

Now let’s get into the actual problem as described in the Hacker rank web site (https://www.hackerrank.com/challenges/self-balancing-tree).

Typically the balance factor is defined as follows:

**static** **int** balanceFactor(Node node) {

**if** (node == **null**) {

**return** 0;

} **else** {

**return** *height*(node.left) – *height*(node.right);

}

}

If you take a look at the following tree:

9

The balance factor is the height of the left tree (none in this case) minus the height of the right tree (also none in this case). The balance factor is zero (0).

In the following tree:

9

/

4

The left side has one entry (4) so the height of the left sub tree is one (1) and the height of the right tree is zero (0) resulting in a balance factor or one (1).

In the following tree:

9

\

12

The left sub tree is none existent so its height is zero (0). The right tree has a height of one (1). The balance factor is minus one (-1). In other words if a tree is balanced the balance factor for all nodes must be in the range **[-1, 1].**

Take a look at the following tree:

9

/

4

/

2

The height of the left sub tree at the root is two (2). The height at the right sub tree is zero (0). The balance factor would be minus two (-2). That indicates that the tree needs to be balanced.

The Hacker rank web site provides a nice graphical representation on how to balance the four (4) cases one may encounter. In the previous tree in this post we would have encountered a **Left-Left Case**. The algorithm should produce the following balanced tree:

4

/ \

2 9

Well, after implementing my solution I was not able to pass the initial tests at HackerRank. I decided to take a look at the discussions to see if the comments could help me determine the issue. Of interest I found the following:

*“Just want to let everyone know that in this question, the height of a “null” node is -1 INSTEAD OF 0, and the height of the leaf node is 0 INSTEAD OF 1. The latter will let you pass several cases, while the former will let you pass every case.”*

With two (2) simple constant changes my solution was accepted. I then went to a reference book to get a better understanding on the suggestion. More on this after I briefly discuss my approach.

My main() code that was not required but I had to implement for unit testing follows:

**public** **static** **void** main(String[] args) {

Node root = **null**;

// **** open scanner ****

Scanner sc = **new** Scanner(System.** in**);

// **** read node values and build the AVL tree ****

**int** N = sc.nextInt();

**for** (**int** n = 0; n < N; n++) {

**int** val = sc.nextInt();

**root**** = insert(**

**root**

**,**

**val**

**);**

}

// **** ****

*breadthFirst*(root);

// **** close scanner ****

sc.close();

}

The requirements of the problem just called for the implementation of the **insert()** function. My implementation follows:

**static** Node insert(Node root, **int** val) {

// **** perform standard BST node insert operation ****

**if** (root == **null**) {

root = **new** Node();

root.val = val;

root.ht = 0;

root.bf = 0;

**return** root;

}

**if** (val < root.val) {

root.left = *insert*(root.left, val);

} **else** {

root.right = *insert*(root.right, val);

}

// **** update height of this node (root == 0) ****

root.ht = *max*(*height*(root.left), *height*(root.right)) + 1;

// **** compute and set the balance factor for this node ****

**int** balance = *balanceFactor*(root);

root.bf = balance;

// **** left-left case ****

**if** ((balance > 1) && (val < root.left.val)) {

**return** *rightRotate*(root);

}

// **** right-right case ****

**if** ((balance < -1) && (val > root.right.val)) {

**return** *leftRotate*(root);

}

// **** left-right case ****

**if** ((balance > 1) && (val > root.left.val)) {

root.left = *leftRotate*(root.left);

**return** *rightRotate*(root);

}

// **** right-left case ****

**if** ((balance < -1) && (val < root.right.val)) {

root.right = *rightRotate*(root.right);

**return** *leftRotate*(root);

}

// **** ****

**return** root;

}

As you can tell the first few lines implement a normal insertion in a binary search tree. It is followed by the computation of the height and the balance factor. At that point the determination is made to rotate or not rotate based on the values. There are only two (2) rotations needed. They are applied alone or in sequence.

**static** Node rightRotate(Node u) {

Node p = u.left;

Node rt = p.right;

// **** rotate on pivot ****

p.right = u;

u.left = rt;

// **** update heights ****

u.ht = *max*(*height*(u.left), *height*(u.right)) + 1;

p.ht = *max*(*height*(p.left), *height*(p.right)) + 1;

// **** update balance factors ****

u.bf = *balanceFactor*(u);

p.bf = *balanceFactor*(p);

// **** return new root ****

**return** p;

}

**static** Node leftRotate(Node u) {

Node p = u.right;

Node lt = p.left;

// **** rotate ****

p.left = u;

u.right = lt;

// **** update heights ****

u.ht = *max*(*height*(u.left), *height*(u.right)) + 1;

p.ht = *max*(*height*(p.left), *height*(p.right)) + 1;

// **** update balance factors ****

u.bf = *balanceFactor*(u);

p.bf = *balanceFactor*(p);

// **** new root ****

**return** p;

}

Note that I added to the Node structure a **couple** items to be able to test and debug:

**static** **class** Node {

// **** ****

**int** val;

**int** ht;

Node left;

Node right;

int bf; // balance factor

// **** constructor(s) ****

public Node() {

}

}

I referred to “**The Art of Computer Programming**” Volume 3 Sorting and Searching second edition by Donald E. Knuth. I went to page 458 section 6.2.3 Balanced Trees. Interestingly, the contents of that section were already **highlighted, underlined and annotated**. Through the years I have purchased two (2) copies of “The Art of Computer Programming”. My first copy was lost on a work move a decade or so ago. I decided to purchase a second copy in **2010**. I always write my name on the first page and the year in which I purchased the book. In other words, I have reviewed this subject at least once in the past six years or so.

On page 459 it states: “The height of a tree is defined to be it maximum level, the length of the longest path from the root to an external node. … Figure 20 shows a balanced tree with 17 internal nodes and height 5.” If you take a look at Figure 20 on page 460, it illustrates a tree with **six levels**. This might tend to imply that the comment on the web site is correct in assuming that the last level should be labeled zero (0). The issue is that Figure 20 shows all the values of the binary tree at the leaf nodes. If one puts the values in the nodes as required in the problem, the last level is not existent. That leaves the tree with only **five levels**. This tends to indicate that the suggestion made is valid due to the implementation used to test the answers but it does not seem to be the most common implementation and one may argue the proper answer.

One way or the other, I am sure this problem made people think :o)

John Canessa

john.canessa@gmail.com