Managing Binary Trees – Linux

This post deals with an API for binary trees in Linux. The API consists of the following functions:

Function Brief Description
tdelete() Deletes an item from a tree.
tdestroy() Deletes the entire tree.
tfind() Finds an item and if not found returns NULL.
tsearch() Search a binary tree for an item.
twalk() Performs a depth-first, left to right tree traversal.

In the past I have used and implemented, using different programming languages, several classes, methods and functions to deal with binary trees. Binary trees are used to keep data in sorted order. This allows for quicker search times. This particular implementation comes with the Linux operating system. You can read more about it by typing on a Linux console:  man twalk

In the man description it states that the implementation is a generalization from Knuth (6.2.2) Algorithm T. I happen to own the set of books that make up “The Art of Computer Programming” by Donald E. Knuth. This particular reference is found in volume 3.

At this time, I am interested in walking (traversing) a binary tree. Let’s take a look at the sample code (which I have added comments and made a modification):

#define _GNU_SOURCE     /* Expose declaration of tdestroy() */
#include <search.h>
#include <stdlib.h>
#include <stdio.h>
#include <time.h>

// **** global variable(s) ****

void *root = NULL;

void *
xmalloc(unsigned n)
    void *p;
    p = malloc(n);
    if (p)
        return p;
    fprintf(stderr, "insufficient memory\n");

compare(const void *pa, const void *pb)
    if (*(int *) pa < *(int *) pb) return -1; if (*(int *) pa > *(int *) pb)
        return 1;
    return 0;

action(const void *nodep, const VISIT which, const int depth)
    int *datap;
    switch (which) {
    case preorder:
		printf("%s <<< preorder *nodep: %d depth: %d\n",
				__func__, **(**((int **)nodep);
    case postorder:
        datap = *(int **) nodep;
        printf("%s <<< postorder %32d depth: %d\n",
		__func__, *datap, depth);
    case endorder:
		printf("%s <<< endorder *nodep: %d depth: %d\n",
		__func__, **(**((int **)nodep, depth);
    case leaf:
        datap = *(int **) nodep;
        printf("%s <<<       leaf %32d depth: %d\n",
		__func__, *datap, depth);

    int 	i,
    void 	*val;
    // **** initialize the random number generator ****
	// **** generate and insert integer values [0:255] ****
    for (i = 0; i < 12; i++) {
		// **** allocate element ****
        ptr = xmalloc(sizeof(int));
		// **** set random value [0:255] ****
		*ptr = rand() & 0xff;
		printf("%s <<< i: %2d *ptr: %32d\n", __func__, i, *prt);
		// **** insert new value ****
        val = tsearch((void *) ptr, &root, compare);
		// **** something went wrong ****
        if (val == NULL) {
			fprintf(stderr, "%s <<< UNEXPECTED val == NULL\n", __func__);
		// **** attempted to insert duplicate value ****
        else if ((*(int **)val) != ptr) {
			fprintf(stderr, "%s <<< DUPLICATE i: %d *val: %d == *ptr: %d\n",
					__func__, i, *((int *)val), *ptr);
			// **** free memory, indicate this, and adjust index ****
			ptr = NULL;
    twalk(root, action);
    tdestroy(root, free);

The code is self explanatory. The idea is to create a binary tree with 12 values in the range [0:255] without duplicates. The issue / omission / bug (call it whatever you wish) is that for each duplicate, the count of entries is reduced by one. Depending on the load on the computer, I encountered two or three duplicates in a run. I corrected this issue by decrementing the index on each duplicate. I also flagged that the memory allocated for the tree was deleted. I fully understand not checking for this in order not to distract from what the author is trying to describe.

Following is the output generated by a run of the code:

main <<< i:  0 *ptr:                              153
main <<< i:  1 *ptr:                              237
main <<< i:  2 *ptr:                              119
main <<< i:  3 *ptr:                              124
main <<< i:  4 *ptr:                              184
main <<< i:  5 *ptr:                               84
main <<< i:  6 *ptr:                              199
main <<< i:  7 *ptr:                               97
main <<< i:  8 *ptr:                               29
main <<< i:  9 *ptr:                              212
main <<< i: 10 *ptr:                              228
main <<< i: 11 *ptr:                              197
action <<<  preorder *nodep: 153 depth: 0
action <<<  preorder *nodep: 119 depth: 1
action <<<  preorder *nodep: 84 depth: 2
action <<<      leaf                               29 depth: 3
action <<< postorder                               84 depth: 2
action <<<      leaf                               97 depth: 3
action <<< endorder *nodep: 84 depth: 2
action <<< postorder                              119 depth: 1
action <<<      leaf                              124 depth: 2
action <<< endorder *nodep: 119 depth: 1
action <<< postorder                              153 depth: 0
action <<<  preorder *nodep: 199 depth: 1
action <<<  preorder *nodep: 184 depth: 2
action <<< postorder                              184 depth: 2
action <<<      leaf                              197 depth: 3	<====
action <<< endorder *nodep: 184 depth: 2
action <<< postorder                              199 depth: 1
action <<<  preorder *nodep: 228 depth: 2
action <<<      leaf                              212 depth: 3
action <<< postorder                              228 depth: 2
action <<<      leaf                              237 depth: 3
action <<< endorder *nodep: 228 depth: 2
action <<< endorder *nodep: 199 depth: 1
action <<< endorder *nodep: 153 depth: 0

Now let’s take a look at the action() function / method. This is a callback invoked by the twalk() API each time a node in the binary tree is visited.

The following figure illustrates the binary tree generated by this pass of the code:

The diagram illustrates building the binary tree in the order in which the nodes were generated. The tree does not support duplicate values. The first node is placed at the root. Smaller values are inserted to the left. Larger values are inserted into the right. The process continues until a suitable place is found. The sequence of insertions is illustrated by the blue labels in order (started at 0 (zero) and counted in binary to make sure they fitted in the small round circle). So far, all looks good.

The horizontal lines illustrate the depth of the nodes in the tree. It shows the number of links, levels one must traverse from the root to the node of interest.

Back to the screen capture of the run of the code, we verify that node value 29 is at depth 3. All looks fine until we compare node 197 which should be in depth 4 according to our diagram and logic, but the output indicates it should be at depth 3. We must have done something wrong. If you are versed in the subject, you already know the answer. If you are not, take a look at the left sub tree starting at the root. It looks balanced. Now, take a look at the one on the right. Starting at node 237, the tree is not balanced.

Trees may degenerate if they are not balanced. For example, try inserting sorted values into a tree and you will end up with a linked list. This implies that searches will take longer. Ideally after inserting a new node into the binary tree; which will always be at a leaf node; one needs to check if the tree needs to be rebalanced.

In a future post, might use a different programming languages, I will illustrate an algorithm to address this issue.

If you have comments / questions regarding this or any other post in this blog, please leave me a comment in the section below. I will respond as soon as possible.

Keep on refreshing and learning;


Follow me on Twitter:  @john_canessa

Leave a Reply

Your email address will not be published. Required fields are marked *