Height is O(lgN)\Omicron(\lg N)

  • Show the height of BST that maintains balance property is O(lgN)\Omicron(\lg N).

This section is an optional reading!

In a BBST, unlike a regular BST, it is guaranteed that the tree's height is O(lgN)\Omicron(\lg N). This claim can be easily observed for a perfect BST, where each internal node has two children and all the leaves are at the same level.

Let's observe the perfect BSTs above

HeightNodesBinary Log Calculation
0011log21=0\log_2 1 = 0
1133log23=1\log_2 3 = 1
2277log27=2\log_2 7 = 2
331515log215=3\log_2 15 = 3

Given these perfect BSTs, one can intuit how this holds more generally.

First, convince yourself that the worst height of a BST is when it has the minimum number of nodes concerning its height. Then, let's call N(h)N(h) the minimum number of nodes given height.

The worst case hO(n)h \in \Omicron(n) is when N(h)=h+1N(h) = h+1.

If we find the upper bound of N(h)N(h) for BBST, then we've bounded the height of all BBSTs.

Let's consider a BBST of height h3h \geq 3 and the minimum number of nodes n=N(h)n=N(h). This tree is composed of a root and two subtrees. Since the whole tree has the minimum number of nodes for its height, so do the subtrees. For the big tree to be of height hh, one of the subtrees must be of height h1h-1. To get the minimum number of nodes, the other subtree is of height h2h-2.

Why can't the other subtree be of height h3h-3 or h4h-4?

Because the balance property requires the height of siblings to differ by at most 11.

So far, we established for h3h \geq 3,

N(h)=1+N(h1)+N(h2) N(h) = 1 + N(h-1) + N(h-2)

We know

N(h)>N(h1)​ and N(h1)>N(h2) N(h) > N (h-1)​ \space \text{and} \space N(h-1) > N(h-2)​

therefore

N(h)>N(h1)+N(h2)>2×N(h2) N(h) > N(h-1) + N(h-2) > 2\times N(h-2)

If N(h)>2×N(h2)N(h) > 2\times N(h-2) then N(h2)>2×N(h4)N(h-2) > 2\times N(h-4), therefore

N(h)>2×N(h2)>4×N(h4) N(h) > 2\times N(h-2) > 4 \times N(h-4)

We can keep doing this

N(h)>>4×N(h4)>8×N(h8)> N(h) > \dots > 4\times N(h-4) > 8\times N(h-8) > \dots

What does the above expression demonstrate? Why is that significant?

This expression demonstrates that the number of nodes at least doubles each time the height increases by a factor of 22. This observation is important because N(h)N(h) is exponential in hh. Therefore hh is logarithmic in the number of nodes.

Continuing the proof, we see that this can be written as:

N(h)>2i×N(h2i) N(h) > 2^i \times N(h - 2i)

The largest ii we can put in there is i=h21i = \frac{h}{2} - 1:

Why is this value of ii chosen as h21\frac{h}{2} - 1?

This allows the expression to be equal to either 11 or 22, depending on if hh is even or odd.

  • Even hh: h2i=h(h2)=2h - 2i = h - (h - 2) = 2
  • Odd hh: h2i=h(2(h2)2)=h((h+1)2)=1h - 2i = h - (2(\frac{h}{2}) - 2) = h - ((h + 1)-2) = 1

So we have:

N(h)>2i×(h2i)>2h2×N(hh)>2h2 N(h) > 2^i \times (h-2i) > 2^{\frac{h}{2}} \times N(h - h) > 2^{\frac{h}{2}}

So we showed the minimum number of nodes, n=N(h)n=N(h) in a BBST of height hh, is at least 2h22^{\frac{h}{2}}:

n>2h2 n > 2^{\frac{h}{2}}

What does this say about height, hh? First, let's take the logarithm of both sides:

lgn>lg2h2h<2lg(n) \lg n > \lg 2^{\frac{h}{2}} \Rightarrow h < 2\lg(n)

So, hO(lgn)h \in \Omicron(\lg n).