[Challenge] Build and Test Balance of a Binary Search Tree

#Code Challenge #5: May 10, 2017

Every week, we feature the type of brain-teasing question that might be asked in a full-stack developer’s job interview at places such as Google and Facebook.

This week’s challenge was reportedly asked in interviews at Google (and even in some other contexts that you may not expect):


###The Challenge

You are given a list of numbers stored in a list, A. Your challenge is to build a [Binary Search Tree] (Binary search tree - Wikipedia) to store these numbers. You will need to:

  1. Represent each node of the tree (including its data, left child and right child).
  1. Print the tree out in an understandable form.
  2. Make a function to insert a list of numbers (A) into the tree.

Check if you can insert all the numbers in A into your tree, and that it prints out as expected.

Use the language of your choice to solve this challenge, but only submissions in languages taught by Codecademy are eligible to “win”.


###Intermediate difficulty

Write a function to check if the Binary Search Tree that you’ve created is balanced.

A tree is considered balanced when the difference between the min depth and max depth does not exceed 1, i.e. if the list had n elements in it the height of the tree would be log(n) (base 2).

#####Find out more about intermediate challenges.


###Hard Difficulty

  1. Adapt your function to insert a list of n numbers so that it runs in O(n log n) time. Bear in mind that this is just the average case for a random sequence of numbers. If the list A was sorted it would take O(n^2).
  1. Adapt your function to check if the tree is balanced so that it runs in O(n) time.

If your BST is balanced then the insert function would have only taken O(n log n) time, see if you can figure out why.

#####Find out more about hard challenges and Big O

###Winner

@tagrockstar79958 was the first to submit a complete working solution at the hard difficulty level: see it here.


###How to Participate

If you want to add your own submission to this challenge, you are more than welcome to do so – it will help you to practice and help everyone else to learn! Simply scroll down and reply to this thread with your code to participate! Don’t just submit your code, remember to explain your solution, too! If you want your entry to be considered under our “best code” assessment, you must include a link to your code running on repl.it and abide by our other simple rules.


The fine print:

Click the links to find out more about:

2 Likes

Some clues on how to start …
With example Python code. Check back for more spoilers that will be released soon.

Spoiler 1:

First, we will define a class called Node. This node needs to reference three things: number, left-child and right-child. So we will make a class to define it:

class Node:
    def __init__(self, num):
        self.left = None                   
        self.right = None                  
        self.num = num
5 Likes

Hard
The input is an array of random length with random ints(0-99)

https://repl.it/Hrq4/0

5 Likes

@tagrockstar79958 nice :smiley:
That’s a great solution that came in really quickly!

Just a couple of points you might want to refine:

  1. Your balance check relies on a sorting algorithm which is O(n log n), you can check the height in just O(n) so there is room for improvment here.
  2. The other thing is, you rely on your input A being sorted. This is a nice adaption but it means you can’t actually build a BST that isn’t sorted, it would be cool if you changed your solution to just insert the numbers one by one in order, so that it doesn’t try to create a balanced tree. If you did want to ensure a balanced tree your solution doesn’t qutie work: for example if I create a BST and then add an extra number after it is already created, your BST will no longer be balanced. There are some cool ways to ensure it is balanced called self-balancing trees if you wanted to implement that feature properly.
3 Likes

Thanks for the great feedback :slight_smile: Yes . . I was thinking about how to add in random numbers and keep it balanced. I figured it would take some sort of update of the tree or at least of more than one node in the tree to keep it balanced. Was too much at that time of night :smiley: I will look into it for sure. And the depth check . . yes I could almost feel that something could be done in a better way there, instead of running through all permutations. Something that would be cool to investigate as well.

A very cool challenge btw!

2 Likes

###Spoiler 2:

Your Node class will have to implement an insert method, to put a new Node as either it’s left or right child.
This method will have to check a few things (assume the current node is A and the new node is B):

  1. Does the B have a number greater than or smaller than A’s number?
  2. If B has a smaller number, but there is already a Node as the left child of A what should happen? (Hint: this is a great time to use recursion.)
  • Would like more “spoilers”.
  • Don’t want any more.

0 voters

1 Like

Spoiler 3:

[spoiler]You should now have a working node class, and the basics of an insert function, if you have been following along with the spoilers, and you should have something like this:

class Node:
    
   def __init__(self, num):
        self.left = None                    # initially nothing in left subtree
        self.right = None                   # and nothing in right subtree
        self.num = num                      # but it's data is the number passed to it

    def insertNum(self, num):
        if num < self.num:                  # check if number is bigger or smaller  
            if self.left is None:           # if nothing in there
                # do something
            else:
                # do something else
        else:
                # handle the case the number is larger  

Reviewing this code, it should be clear that one node represents a full tree (or sub-tree). So we can begin creating it:

A = [8,4,12,2,6,10,14,1,3,5,7,9,11,13,15]    # input A (fully balanced)
root = Node(8)
root.insert(4)
root.insert(12)
root.insert(2)
etc.

Above I have just taken the numbers out of the Array manually. Your next task is to create a function that will output a tree if you pass it A as a variable e.g. tree = generateTree(A) this function should return a Node, with all its children.[/spoiler]

@tagrockstar79958 or anyone made any more progress?
People should be able to do the basic challenge now with the hints given.

I’m working on a Javascript solution, hope to be ready soon!

2 Likes

Intermediate difficulty

The input is given manually as a list in the script. I did not try to tackle hard challenge since I did not have time to read about performance and Big O notation so far.

https://repl.it/HwHp/21

2 Likes

Hard
Github version - please star! :slight_smile:

https://repl.it/Hw7u/3

Here on repl.it i commented out print() method. It uses console.group() and repl.it has this feature blocked. You can still copy paste code into console and uncomment it :). I think this form is understandable enough - not really state of art though :).

Both add() and isBalanced() methods run with given time complexity. (actually isn’t inserting O(log n) ? ) - still having some trouble working with Big O.

I tried to cover my thought process in comments in code - thanks for reading!

2 Likes

In the BST, inserting of a single value is on average O(log n). But we asked you guys to write a function that inserts the whole list of values, and this is on average O(n log n) :slight_smile: I hope it makes some sense.

2 Likes

This is rreally, REALLY out of my reach. I will just sit back and enjoy others solutions.
But I have some stupid questions:

  1. What is the relation between this node-storage and the the real storage in the computer?Or do we have to care. There used to be something called defragmentation that you had to do now and then.

  2. The obvious way of storage is in SQL. But something tells me this has nothing to do with this. Why?

2 Likes

In computing, binary trees are seldom used solely for their structure. Much more typical is to define a labeling function on the nodes, which associates some value to each node. Binary trees labelled this way are used to implement binary search trees and binary heaps, and are used for efficient searching and sorting. - WIkipedia

More info: What are the applications of binary trees? - Stack Overflow


A practical real word example would be a 3-D video Game , 3-D games use what is known as Binary Space Partition in order to determine what objects need to be rendered. Cool yea?

3 Likes

@alexcraig, please help take a look.
A little bit late as I am on a business trip to US.

I Implemented a self-balancing tree, with 8 transformation rules.

Every node has:

  • Left Child - pointer to left child node with smaller value
  • Right Child - pointer to right child node with bigger value
  • Parent - pointer to parent node
  • Height - the max height of current node
  • ID - uniq ID for each node, based on simple binary rules from top to bottom, root = 1, root.left.id = root.id 2, root.right.id = root.id2 + 1. Could used a more simpler method to print out the tree.

Transformation Rules upon adding a new node:

  • Line: 3 >R> 4 >R> 5
  • Line: 3 >R> 5 >L> 4
  • Line: 5 >L> 4 >L> 3
  • Line: 5 >L> 3 >R> 4
  • 2 Rules on Left child node heights bigger
  • 2 Rules on Right child node heights bigger

https://repl.it/HzFf/16

1 Like

Hello @betamaster97156, that’s a lot of code! Thank you very much for your submission :slight_smile: I would like to give you few remarks regarding your code.


In the challenge, we asked you to build a tree from an array of values and create a function that inserts a whole array of values.

Self-balancing trees are great, they can save us a lot of time. The difference between log n and n is huge. But sometimes auto-balancing is not the best idea. Let’s say that we use our insert function to add few thousand elements. In production, this is sometimes handled by the single call to the rebalance function.

So we add elements to the tree normally, without auto-balancing. In result, we have (probably) unbalanced BST tree. rebalance function will take this tree and will construct a balanced version of this tree in linear time.

We know that building BST from an array is an O(n log n) operation. So how can we construct the balanced tree in O(n)? This is possible because BST gives us direct access to all elements in a sorted manner (by a simple in-order traversal). And from sorted array, we can build balanced BST in O(n).

In production, these two methods are often combined. Single element insert? Rotations. Insertion of many (in regards to the current size of the tree) elements? Insertion without rotations and rebalancing after the last insert.


Your code uses a single class, Node. In the case of AVL trees, this might be a bit confusing. For example, what is the height of the single node? This question does not make any sense, right? And yet, your code does answer it.

It was our suggestion to use a single class. But in BST you don’t have to store root or height internally. From a single node - root (global variable) you can access every bit of information related to the tree.

In most implementations of the AVL (and of more complex tree structures) you will find two classes - Node (TreeMember) (which represents a single node, has parent, key, left and right properties) and Tree (AVL) (with height, balance and root properties).

Your code is technically valid, but it does not make much sense in regard to OOP.


There is a hmap variable in your code which is used only for printing out the tree. This is not the best use of memory. In trees, you only need a reference to the root element to be able to access all nodes level by level. There is a well-known graph algorithm named breadth-first search that traverses trees and graphs level by level.

On the other hand, hmap might be useful for some sort of debugging. You can introduce a single global boolean variable called debug. If debug is True -> create hmap and populate it.


Your code implements 8 rotation rules. You only need two of them - left and right. Please note that AVL tree is always balanced, this reduces the number of the possible situation that requires rotation. Sometimes you need to make two rotations, one after another. But there should be only two functions to handle rotations.

There are many resources that explain why only two types of rotations are needed, but I think that the best idea is to take pencil and simply try to crack this problem :slight_smile:

7 Likes


This challenge is now officially over, everyone who submitted gave an excellent piece of code!

Well done to @tagrockstar79958 @tomegz @rd0122 and @betamaster97156
Everyone’s code worked well and gave a good implementation of the problem.

@tagrockstar79958 won, for being the first to submit a complete working solution.

Good luck in the next one!

2 Likes

Wow, thansk for the long comment. Can’t agree more on your comment. I am not not work in oop nor SW field, so i am ready weak in many of the concept such as BST, OOP, btreadth-first search, etc. Tanks for the feedback & teaching. :smile:

2 Likes

Clean up the code, stop using id for printing
https://repl.it/HzFf/24

This challenge posed a lot of challenges - pun intended - and raised some questions, that I would like to ask here.
First, talking O. The hard challenge posits:

Adapt your function to insert a list of n numbers so that it runs in O(n log n) time.
Bear in mind that this is just the average case for a random sequence of numbers.
If the list A was sorted it would take O(n^2).

Isn’t big O worst-case-scenario run-time? Not average? Thus, if inserting a sorted list, runs in O(n^2), inserting any list runs in O(n^2)!?. Only if we’re talking Omega or Theta can run-times differ between lists, right?

Next, the hard challenge asks:

Adapt your function to check if the tree is balanced so that it runs in O(n) time.

What is supposed to run in O(n)? The balance check? The insertion of the list? How would the check affect the insertion? Unless we would also modify the insertion method, to take the balance check into account…?

Finally, it states:

If your BST is balanced then the insert function would have only taken O(n log n) time, see if you can figure out why.

Does the insert solution below not run in in O(n log n)?

# Create a node
class TreeNode(object):

    def __init__(self, value):
        self.value = value
        self.left = None
        self.right = None

    def __repr__(self):
        return "{}".format(self.value)

# Create a binary search tree
class BinarySearchTree(object):
        
    def __init__(self, values):    
        self.root = None
        self.height = 0
        for value in values:
            self.insert_node(value)
           
    # Insert a node
    def insert_node(self, value):
        node = self.root
        if self.root is None:
            self.root = TreeNode(value)
            return
        while node:
            if value < node.value:
                if node.left is None:
                    node.left = TreeNode(value)
                    return
                node = node.left
            else:
                if node.right is None:
                    node.right = TreeNode(value)
                    return
                node = node.right
        return

I had desired to make height and is_balanced methods of the BinarySearchTree class without using arguments, but I couldn’t figure out how to make that work, I kept circling back to desiring definitions as

def height(self, node=self.root):

This is not possible, though. Can someone explain to me why?