AVL trees and balancing operations

Recommended reading

"Data Structures and Algorithms in C", Mark Allen Weiss, Addison Wesley Ch 4.

Introduction

There is no reason why a Knuth binary tree should balance. Average performance for random input should not be greatly below best i.e. balanced performance, but in the worst case, if the tree is made from sorted input, it will degrade to a linked list with access time of O(N) where there are N nodes. Access time for a node within a balanced tree will be O(log2N).

Operations which change the balance of a tree are insertion and deletion. Insertion might increase the maximum depth of a tree, but deletion can't do this. A number of deletion operations can't increase the absolute average search time within remaining nodes, but will increase the time taken as a function of the diminishing number remaining.

Therefore, if effort is going to be expended in balancing an unbalanced tree, by rearranging nodes, this can initially be most usefully applied following insertion of nodes.

A perfectly balanced tree can only exist if N was 2x-1 where x is an integer, i.e. if N is a value the series 1, 3, 7, 15, 31, 63, 127, 255 ... Practical trees must allow for any value of N which can fit into memory. The need for perfect balance could be relaxed by allowing the bottom level of the tree to be partially filled. This is difficult to achieve, as the entire tree might need rearranging after every insertion, and also deletion if we care about the potential unbalancing effect of deletion.

The imbalance problem only arises within the path through the tree on which a node is inserted. A recursive insertion function will return upwards through the insertion path. If information about the maximum depths of nodes is maintained as part of each nodes structure, it becomes possible to correct imbalances through localised tree rearrangements following insertion on the recursive return path upwards through the tree to the root.

The following data structure for a node allows for level information to be stored:

typedef char DATA; /* this could be for any record */

typedef struct node { /* structure of AVL tree node */
        DATA d;
        int level; /* -1 indicates a null tree, level stores max
                     no. of occupied levels below current node */
        struct node *left;
        struct node *right;
} NODE;

typedef NODE* TOP; /* enables easier pointers to pointers */

AVL tree balance condition

A particular approach to performing the tree rearrangements needed to keep a tree balanced was designed by Adelson-Velskii and Landis and named after their initials as the AVL tree.

An AVL tree has a balance condition such that each node stores the maximum depth of nodes below it. A NULL tree has a level of -1 and a leaf node has a level of 0. A node having 1 or 2 subtrees will have as its own level the maximum of left or right subtree levels +1. Within an AVL tree, the maximum difference allowed between left and right subtree levels is 1. When an insertion or deletion operation results in this difference increasing to 2, rearrangements called rotations are performed to reduce the imbalance for an AVL tree to retain the AVL balance condition.

AVL node level

Following an insertion or deletion which may influence the levels of subtrees, the level of nodes affected will need to be recalculated. This can be achieved using a function operating on the above data structure as follows:

int set_level(TOP t){ /* sets level in t based on levels in subtrees */
        int left_lev=-1, right_lev=-1;
        if(t->left)
                left_lev=t->left->level;
        if(t->right)
                right_lev=t->right->level;
        /* set level to highest of left or right level + 1 */
        if(left_lev>=right_lev)
                t->level=left_lev+1;
        else
                t->level=right_lev+1;
        return t->level;
}

This function returns the level of the current node, as well as setting it.

AVL rotations

A difference of levels between left and right subtrees can be increased, through insertion of a new node into the tree, in 4 possible ways:

  1. Insertion into left subtree of left branch.
  2. Insertion into right subtree of left branch.
  3. Insertion into left subtree of right branch.
  4. Insertion into right subtree of right branch.

Rotations which reduce or correct the resulting imbalances are diagrammed for these 4 cases below.

Functions which rearrange a tree below a particular top address must change the address of the top node of a tree or subtree, to the address of another node within the same tree. The address of this address has to be passed as a call by reference parameter, i.e. as a pointer to a pointer, so that the function can change the address within the calling program.

The function prototype for a rotation will be of the form:

void rotN(TOP *t); /* N indicates type of rotation 1-4 */

Case 1

/*AVL tree rotation, type 1 */
void rot1(TOP *t){
        NODE *k1,*k2=*t,*x,*y,*z;
        /* assign pointers to rotation object k1,
         * and subtrees x,y and z*/
        k1=k2->left;
        x=k1->left;
        y=k1->right;
        z=k2->right;
        /* perform rotation */
        k1->right=k2;
        k2->left=y;
        /* adjust k1 and k2 levels */
        set_level(k2);
        set_level(k1);
        /* replace t parameter in caller to reflect new subtree top */
        *t=k1;
}

Case 2

/*AVL rotation, type 2 */
void rot2(TOP *t){
        NODE *k1,*k2,*k3=*t,*b,*c;
        /* assign pointers to rotation object k1,
         * and subtrees b and c */
        k1=k3->left;
        k2=k1->right;
        b=k2->left;
        c=k2->right;
        /* perform rotation */
        k1->right=b;
        k3->left=c;
        k2->left=k1;
        k2->right=k3;
        /* adjust k1,k3 and k2 levels */
        set_level(k1);
        set_level(k3);
        set_level(k2);
        /* replace t parameter in caller to reflect new subtree top */
        *t=k2;
}

Case 3

This case is symmetric with case 2.

/*AVL rotation, type 3 */
void rot3(TOP *t){
        NODE *k1=*t,*k2,*k3,*b,*c;
        /* assign pointers to rotation object k1,
         * and subtrees b and c */
        k3=k1->right;
        k2=k3->left;
        b=k2->left;
        c=k2->right;
        /* perform rotation */
        k1->right=b;
        k3->left=c;
        k2->left=k1;
        k2->right=k3;
        /* adjust k1,k3 and k2 levels */
        set_level(k1);
        set_level(k3);
        set_level(k2);
        /* replace t parameter in caller to reflect new subtree top */
        *t=k2;
}

Case 4

This case is symmetric with case 1.

/*AVL rotation, type 4 */
void rot4(TOP *t){
        NODE *k1=*t,*k2,*x,*y,*z;
        /* assign pointers to rotation object k1,
         * and subtrees x,y and z*/
        k2=k1->right;
        x=k1->left;
        y=k2->left;
        z=k2->right;
        /* perform rotation */
        k1->right=y;
        k2->left=k1;
        /* adjust k1 and k2 levels */
        set_level(k1);
        set_level(k2);
        /* replace t parameter in caller to reflect new subtree top */
        *t=k2;
}

Choosing the rotation

The rotation type 1 - 4 can be determined from whether the insertion is on the left or right child of the parent, and whether the data inserted was on the left or right subtree of the child. This involves comparing 3 keys: the key inserted and the child and parent key. It is useful to have a comparison function which can return -1 0 or +1 depending upon whether the first record is less than, equal to or greater than the second (as with the strcmp() function in string.h).
int cmp(DATA d1, DATA d2){ /* compares d1, d2 returns -1, 0 or 1 */
        if(d1<d2)
                return -1;
        else if(d1>d2)
                return 1;
        else
                return 0;
}

The avl_type() function returns a value from 1 - 4 to indicate the type of rotation following insertion, or 0 is returned if the new node is a leaf.

int avl_type(DATA di,DATA parent,DATA child){
        /* returns int 1 - 4 to indicate AVL rotation
         * type based on class of insertion.
         *
         * Parameters:  di data inserted
         *              parent data at parent
         *              child data at child
         * Returns:     1 insertion on left tree of left child
         *              2 insertion on right subtree of left child
         *              3 insertion on left subtree of right child
         *              4 insertion on right subtree of right child
         *              0 if di == child, i.e. child is leaf node
         */
        if(cmp(di,parent)<0){ /* insert on left */
                if(cmp(di,child)<0) /* ins on left ST of left */
                        return 1;
                else if(cmp(di,child) == 0) /* child is leaf */
                        return 0;
                else /* ins on right ST of left */
                        return 2;
        } else {
                if(cmp(di,child)>0) /* ins on right ST of right */
                        return 4;
                else if(cmp(di,child) == 0) /* child is leaf */
                        return 0;
                else /* ins on left ST of right */
                        return 3;
        }
}
void avlrotate(int avlrot,TOP *t){
  /* performs avlrotation based on avlrot type 1 - 4 */
        if(avlrot==1)
                rot1(t);
        else if(avlrot==2)
                rot2(t);
        else if(avlrot==3)
                rot3(t);
        else if(avlrot==4)
                rot4(t);
        else
                printf("avlrotate: illegal rotation\n");
}

Performing an AVL balanced insertion

Having coded the above functions, it is now possible to insert a record into the AVL tree and maintain the AVL balance condition.

int insert(DATA di,TOP *t){ /* recursively inserts di into tree
      with TOP t, returns depth of tree t after insertion */
        NODE *ins,*top=*t;
        int left_lev=-1,right_lev=-1,diff,avlrot;

        if(!top){  /* empty tree, create new tree */
                /* make node */
                ins=(NODE*)malloc(sizeof(NODE));
                ins->d=di;
                ins->level=0;
                ins->left=NULL;
                ins->right=NULL;
                *t=ins;
                return 0;
        }
        if(top->left)
                left_lev=top->left->level;
        if(top->right)
                right_lev=top->right->level;


        if(cmp(di,top->d)<0){ /* insert on left */
                if((left_lev=insert(di,&(top->left))) == -1)
                        return -1; /* report insert failure */
                avlrot=avl_type(di,top->d,top->left->d);
        } else if(cmp(di,top->d)>0){ /* insert on right */
                if((right_lev=insert(di,&(top->right))) == -1)
                        return -1; /* report insert failure */
                avlrot=avl_type(di,top->d,top->right->d);
        } else { /* duplicate key */
                printdata(di);
                printf("insert: duplicate key error\n");
                return -1;
        }
        /* check if AVL rebalance required (i.e. difference in
         * left and right subtree heights >= 2 */
        diff=left_lev - right_lev;
        if(diff >= 2 || diff <= -2){
                printf("AVL imbalance : %d AVL type: %d\n",diff,avlrot);
                /* correct imbalance by performing AVL rotation */
                avlrotate(avlrot,t);
                top=*t;
        }
        /* set and return level of current node */
        return set_level(top);
}

Testing AVL insertion

First we need to be able to print the entire tree. This can be done recursively in left top right order. It is useful when debugging to be able to identify the immediate left and right child nodes for each record printed.

#define DEBUG 1 /* or 0 for less noisy output */

void printall(TOP t){ /* recursively prints data in left top right order */
        static int recsop=0; /* no of recs printed */
        if(!t){
                printf("printall: can't print NULL tree\n");
                return;
        }
        /* print left subtree */
        if(t->left)
                printall(t->left);
        /* print data for current node */
        if(DEBUG && t->left) printdata(t->left->d); else printf("  ");
        printdata(t->d);
        printf("%d ",t->level);
        recsop++;
        if(DEBUG && t->right) printdata(t->right->d); else printf("  ");
        if(recsop%4==0) printf("\n"); else printf("    ");
        /* print right subtree */
        if(t->right)
                printall(t->right);
}

void printdata(DATA d){ /* prints DATA item d */
        printf("%c ",d);
}

We now need to be able to insert a number of DATA nodes into a tree, and print it out.

int main(void){ /* test code for AVL binary tree construction */
        char str[]="a quick brown fox jumps over the lazy dog";
        int i;
        TOP t=NULL;
        for(i=0;i<strlen(str);i++)
                if(str[i]!=' ')
                        insert((DATA)str[i],&t);
        printall(t);
        printf("\n");
        return 0;
}

Test results

AVL imbalance : -2 AVL type: 4
AVL imbalance : -2 AVL type: 3
AVL imbalance : 2 AVL type: 2
AVL imbalance : 2 AVL type: 2
AVL imbalance : -2 AVL type: 3
o insert: duplicate key error
u insert: duplicate key error
o insert: duplicate key error
AVL imbalance : -2 AVL type: 3
r insert: duplicate key error
AVL imbalance : -2 AVL type: 4
AVL imbalance : -2 AVL type: 4
e insert: duplicate key error
AVL imbalance : -2 AVL type: 3
a insert: duplicate key error
AVL imbalance : -2 AVL type: 4
AVL imbalance : -2 AVL type: 3
o insert: duplicate key error
AVL imbalance : -2 AVL type: 3
  a 0       a b 2 c       c 1 d       d 0
b e 3 g       f 0       f g 1 h       h 0
e i 4 k       j 0       j k 2 m       l 0
l m 1       i n 5 u       o 1 p       p 0
o q 2 s       r 0       r s 1 t       t 0
q u 3 w       v 0       v w 2 y       x 0
x y 1 z       z 0

The results above show in 4 columns per node:

  1. The left child key or blank.
  2. The node key.
  3. The node level.
  4. The right key or blank.

We can see that the tree was maintained through a set of AVL rotations in key sorted order, and that node n (level 5) ended up at the top.