Binary Search Tree

Review of the basic data structure Binary Search Tree (BST)

2017-02-14

programming

Overview

Binary Search Tree is a recursive data structure that is useful for quick searching, insertion, and deletion. It averages a time complexity of log(n) for these operations. Here is how it compares to some other basic data structures:

Operation	Array	Linked List	Array (sorted)	BST
search	`O(n)`	`O(n)`	`O(log n)`	`O(log n)`
insert	`O(1)`	`O(1)`	`O(n)`	`O(log n)`
remove	`O(n)`	`O(n)`	`O(n)`	`O(log n)`

In this post, I will go over these three basic operations.

Properties

Each node stores some kind of value (number, string, ect.) and also links to its left and right child nodes (also known as subtrees.)
For each node, n in the tree, all of the values of its left subtree must be less than (or equal to) n‘s value.
For each node, n in the tree, all ofthe values of its right subtree must be greater than n‘s value.

A node can be implemented like this:


/* C++ Binary Search Tree node */
struct Node {
    int value;
    Node *left;
    Node *right;
}

Search

To check if a value exists within a binary search tree, you can make use of the fact that for any node, it’s left subtree has values less than (or equal to) the node. Same idea for the right subtree – it only contains values greater than node’s value. Because of this property, searching can be done in a fraction of the time of an array or linked list as long as the tree is balanced. In fact, the number of nodes to search is cut in half for each comparison. Searching can be implemented as follows:


/* C++ recursive search */
Node* SearchRecursive(Node* root, int value) {
   if (root) {
       if (root->value == value) return root;
       else if (value <= root->value) return SearchRecursive(root->left, value);
       else return SearchRecursive(root->right, value);
   }
   return NULL;
}

And if you prefer to avoid recursion:


/* C++ iterative search */
Node* SearchIterative(Node* root, int value) {
    if (root) {
        vector<Node*> to_visit;
        to_visit.push_back(root);
        while (to_visit.size() > 0) {
            Node* next = to_visit.back();
            to_visit.pop_back();

            if (next->value == value) return next;
            else if (next->left && value <= next->value) to_visit.push_back(next->left);
            else if (next->right && value > next->value) to_visit.push_back(next->right);
        }
    }
    return NULL;
}

Insert

When inserting a new value into a binary search tree, the basic idea is as follows: Start at the root of the tree. If the root does not exist then create it, and insert the new value. If the root does exist, check if the value being inserted is either less than (or equal to) or greater than the left and right nodes, respectively. Continue down the tree, and the value will be inserted as a new node at the correct spot.


/* C++ recursive insert implementation */
Node* Insert(Node *root, int value) {
    if (!root) {
        root = new Node();
        root->value = value;
    }
    else if (value <= root->value) root->left = Insert(root->left, value);
    else root->right = Insert(root-right, value);
    return root;
}

The same can be done iteratively:


/* C++ iterative insert implementation */
Node *Insert(Node *root, int value) {
    if (!root) {
        root = new Node();
        root->value = value;
        return root;
    }

    vector<Node*> to_visit;
    to_visit.push_back(root);
    Node *next = root;

    while (to_visit.size() > 0) {
        next = to_visit.back();
        to_visit.pop_back();

        if (value <= next->value) {
            if (next->left) to_visit.push_back(next->left);
            else {
                next->left = new Node();
                next->left->value = value;
            }
        }
        else {
            if (next->right) to_visit.push_back(next->right);
            else {
                next->right = new Node();
                next->right->value = value;
            }
        }
    }

    return root;
}

Remove

Another common operation is removal of items. This is also an efficient operation due to the properties of a binary search tree.

Before removing an element, it must first be found. This is done the same way as implemented above. Once the element is found, there are a couple different scenarios that can happen when removing it.

It’s simple to remove a leaf node from the tree because there are no subtrees to worry about.
It’s also simple to remove a node that has only one subtree, because we can just move the pointer from the parent down to the root of the subtree.
It’s a little more tricky to remove an element that has two subtrees because the properties of a BST can become messed up if it’s not done right. To ensure that the BST remains valid, either use the minimum value from the right subtree of the node to be removed, or the maximum value in the left subtree of the node to be removed.


/* recursive removal */
Node* Remove(Node* root, int value) {
    // step 1: find the element to be removed
    if (root) {
        if (value < root->value) root->left = Remove(root->left, value);
        else if (value > root->value) root->right = Remove(root->right, value);
        else {
            // here we found the element to remove.

            // Easy case - element is a leaf node.
            if (!root->left && !root->right) {
                delete root;
                root = NULL;
            }

            // Easy case - element to delete has a subtree on the right.
            else if (!root->left) {
                Node* temp = root;
                root = root->right;
                delete temp;
            }

            // Easy case - element to delete has a subtree on the left.
            else if (!root->right) {
                Node* temp = root;
                root = root->left;
                delete temp;
            }

            // Tricky case - element to delete has two subtrees.
            else {
                /* Find either the minimum element in the right subtree
                   or the maximum element in the left subtree. */
                Node* min = MinNode(root->right);
                root->value = min->value;
                // The problem has now been reduced to an easy case.
                root->right = Remove(root->right, min->value);
            }
        }
    }
    return root;
}

In the tricky case and after the minimum element is found, it is copied into the node that’s being removed, and then the problem is reduced down to one of the simpler cases.

I’ve gone over search, insert, and delete operations for a binary search tree. Code from this post can be found here. There are many more operations and things to learn about this data structure, but I don’t want this post to get too lengthy. Have a look at my other article, in which I go over traversal.