Min-Heap as a sorting algorithm

I finished the min-heap module with little sense of what it really was, so I went back for another look. A min-heap is an efficient method for creating a list in such a way that the minimum element is always instantly accessible without a search, and in popping that minimum, the next minimum rises (also efficiently) to the top.

It occurred to me that this could be used in a sorting algorithm: from the list you want sorted, append each item to the heap, then pop each item into a new list (or the same one, if you want to sort in place.)

Since the heap is based on a binary tree, for a list of n elements, if n is equal to 2^k for some k, the tree is k elements high. Thus the maximum number of swaps to go up or down the tree is k or log(n).

Since you need to go through the list twice (once for append and once for pop), and since each step can involve a maximum of log(n) additional steps, the time for my sort should increase as approximately 2nlog(n).

So … I decided to compare with an implementation of merge sort (known to sort in n*log(n) ) from a different course, and compare the times.

The heap_sort() function (sort-in-place version) is:

def heap_sort(lst):
    min_heap = MinHeap()
    while lst:  
      min_heap.add(lst.pop())
           # remove minimum until min_heap is empty
    while min_heap.count != 0:
      lst.append(min_heap.retrieve_min())

… and when run alongside merge_sort() using the timeit module for timing, we see:

You can see that heap_sort follows n*log(n) closely, running linearly slower (up to 3x) than merge_sort. Interestingly, it looks as though somewhere between a list length of 10^5 and 10^6, heap_sort becomes faster than merge_sort, probably (I say with absolutely no authority!) due to inefficiencies inherent in recursion.

7 Likes