Height and depth of a tree and determining if it's balanced.

keywords

programming2017-02-16

I have been reviewing and brushing up on some basic data structures in Computer Science. To go along with these other posts, I am now writing about some more properties of binary trees: Height, depth, and balancing.

**Height** of a tree is the number of edges in the longest path from the root node to a leaf node. A tree with no children would have a height of 0. A tree with one child, or a tree with just one left and one right child node would have height 1 at the root. A tree with no nodes (`NULL`

) will have a height defined as -1.

```
F <---- height 2 (F)
/ \
D H <---- height 1 (D, H)
/ \ / \
B E G J <---- height 0 (B, E, G J)
```

Finding the height of a tree involves visiting every node and recursively adding 1 to the maximum value in the left and right subtrees. Here is a Python example:

```
def max_height(node):
"""recursively find the max height of a binary tree"""
if not node:
return -1 # height of a NULL node is -1
return max(max_height(node.left),
max_height(node.right)) + 1
```

So, that’s height of a tree. **Depth** of a tree is the number of edges from the root node to another node. I think of it sort of as the opposite of height. Where height is from the perspective of the top down, depth is the same, but from the bottom up.

```
F <---- depth of F: 0
/ \
D H <---- depth of D, H: 1
/ \ / \
B E G J <---- depth of B, E, G J: 2
```

For a tree to be **balanced**, the left and right subtrees should never differ in height by more than one. A balanced tree will yield optimal performance, but a tree that is not balanced won’t be much better than a linked-list data structure, so that’s why it’s an important property. A tree can be checked to see if it’s balanced or not with the help of the `max_height`

function implemented above. Here’s a Python example:

```
import math
def is_balanced(node):
"""determine if tree rooted at `node` is balanced or not"""
if not node:
return True
left_height = max_height(node.left)
right_height = max_height(node.right)
return (math.abs(left_height - right_height) <= 1 and
is_balanced(node.left) and is_balanced(node.right))
```

This works to determine if the tree is balanced, but has a time complexity of `O(n^2)`

, which is not very efficient. Nodes of the subtrees are visited multiple times. The algorithm can be optimized to run more efficiently by cutting out some of the calls to `max_height`

. To fix it, I am going to make a new helper function to get the height, called `get_height`

.

```
import math
def get_height(node):
"""get height of given `node` in tree, duck out early if non-balanced"""
if not node:
return -1 # height of NULL node is -1
left_height = get_height(node.left)
if left_height == -2:
return -2
right_height = get_height(node.right)
if right_height == -1:
return -2
if math.abs(left_height - right_height) <= 1:
return max(left_height, right_height) + 1
return -2 # -2 is used to indicate that it's not balanced anymore
def is_balanced(node):
"""check if tree rooted at `node` is balanced"""
return get_height(node) != -2
```

This algorithm runs in `O(n)`

time because the height and balance condition are being checked at essentially the same time. `get_height`

is now more of a helper function to `is_balanced`

. It returns -2 to indicate that the balance condition is no longer satisfied for a given subtree, and returns early without computing every value.