Heapify: A Linear-Time Operation

  • Show Floyd's heapify is a linear-time operation.

Each node, in Floyd's algorithm, in the worst case, sinks to become a leaf. The sink operation involves repeatedly (recursively) swapping the value with one of the children. Let's make the following observation:

Each node at height hh makes at most hh swaps (down).

So if you are a leaf, at height h=0h=0, you make no swaps down. If you are one level higher, at height h=1h=1, you make at most one swaps down. And so on. Let's make the following observation:

In a complete binary tree with nn nodes, there are n2\left \lceil \frac{n}{2} \right \rceil nodes at height h=0h=0 (leaves), n4\left \lceil \frac{n}{4} \right \rceil nodes at height h=1h=1, n8\left \lceil \frac{n}{8} \right \rceil nodes at height h=2h=2, \dots

In general, there are at most n2h+1\left \lceil \frac{n}{2^{h+1}} \right \rceil nodes of height hh, so the cost of building a heap is

h=0lgnn2h+1O(h) \sum_{h=0}^{\left \lfloor \lg n \right \rfloor} \left \lceil \frac{n}{2^{h+1}} \right \rceil \Omicron(h)

We can rewrite the above expression as

O(h=0lgnn2h+1h) O \left ( \sum_{h=0}^{\left \lfloor \lg n \right \rfloor} \left \lceil \frac{n}{2^{h+1}} \right \rceil h \right )

Since nn does not depend on the value of hh, it can be moved out of the fraction

O(nh=0lgnh2h+1) O \left ( n \sum_{h=0}^{\left \lfloor \lg n \right \rfloor} \left \lceil \frac{h}{2^{h+1}} \right \rceil \right )

Further observe that h2h×2<h2h\left \lceil \frac{h}{2^{h}\times 2} \right \rceil < \frac{h}{2^{h}} and we can use the upper-bound within big-O notation.

O(nh=0lgnh2h) O \left ( n \sum_{h=0}^{\left \lfloor \lg n \right \rfloor} \frac{h}{2^{h}} \right )

Moreover,

h=0lgnh2h<h=0h2h \sum_{h=0}^{\left \lfloor \lg n \right \rfloor} \frac{h}{2^{h}} < \sum_{h=0}^{\infty} \frac{h}{2^{h}}

And, we can use the upper-bound within big-O notation: ​​ O(nh=0h2h) O \left ( n \sum_{h=0}^{\infty} \frac{h}{2^{h}} \right ) ​ It turns out the summation converges to 22: (It is beyond the scope of this course to prove this, but you can check it out on Wolfram Alpha.)

h=0h2h=2 \sum_{h=0}^{\infty} \frac{h}{2^{h}} = 2

Therefore, we get the following which means the heapify operation is linear time.

O(n×2)=O(n) O \left ( n \times 2 \right ) = O \left ( n \right )

A more intuitive analysis

Let's consider a perfect binary tree; here is a less accurate but more intuitive analysis:

Height#nodesworst-case #swapsTotal work
00n/2n/2n/2×0=0n/2 \times 0 = 000
11n/4n/4n/4×1=n/4n/4 \times 1 = n/4n/4n/4
22n/8n/8n/8×2=n/4n/8 \times 2 = n/4n/2n/2
33n/16n/16n/16×3<n/8n/16 \times 3 < n/8n/2+n/8n/2 + n/8
44n/32n/32n/32×4=n/8n/32 \times 4 = n/8n/2+n/4n/2 + n/4
55n/64n/64n/64×5<n/16n/64 \times 5 < n/16n/2+n/4+n/16n/2 + n/4 + n/16
66n/128n/128n/128×6<n/16n/128 \times 6 < n/16n/2+n/4+n/8n/2 + n/4 + n/8
\vdots\dots\dotsn/2+n/4+n/8+n/2 + n/4 + n/8 + \dots

The summation for total work is in O(n)\Omicron(n).

Resources

For a more detailed analysis of Floyd's algorithm, refer to Page 157 (Chapter 6, Section 3) of CLRS.