Codecademy Forums

FAQ: Decision Trees - Classifying New Data

Community%20FAQs%20on%20Codecademy%20Exercises

This community-built FAQ covers the “Classifying New Data” exercise from the lesson “Decision Trees”.

Paths and Courses
This exercise can be found in the following Codecademy content:

Data Science

Machine Learning

FAQs on the exercise Classifying New Data

There are currently no frequently asked questions associated with this exercise – that’s where you come in! You can contribute to this section by offering your own questions, answers, or clarifications on this exercise. Ask or answer a question by clicking reply (reply) below.

If you’ve had an “aha” moment about the concepts, formatting, syntax, or anything else with this exercise, consider sharing those insights! Teaching others and answering their questions is one of the best ways to learn and stay sharp.

Join the Discussion. Help a fellow learner on their journey.

Ask or answer a question about this exercise by clicking reply (reply) below!

Agree with a comment or answer? Like (like) to up-vote the contribution!

Need broader help or resources? Head here.

Looking for motivation to keep learning? Join our wider discussions.

Learn more about how to use this guide.

Found a bug? Report it!

Have a question about your account or billing? Reach out to our customer support team!

None of the above? Find out where to ask other questions here!

1 Like

Would you, please, explain the meaning of the function: “if isinstance(tree, Leaf):
return max(tree.labels.items()”.
To understand the function better, is important to know the content of the Data. But I can’t read neither Leaf, nor tree.labels.items() via the standard ‘print’ function. How can I read it?

Thank you.

isinstance(tree, Leaf)

The above tests if the tree object is an instance of the Leaf class.

class Leaf:
    def __init__(self):
        pass

tree = Leaf()

print (isinstance(tree, Leaf))    # True

To be able to read the data, we need to know all the attributes. if the class has a __repr__ or __str__ method, we can print it.

 def __repr__(self):
    return "Attribute: {value}".format(value=self.attribute)

print (tree)    # Attribute: %value%

Please show us your Leaf class, as well as the instantiation of tree.

2 Likes

Would you, pleace, tell, why do we use the argument key=operator.itemgetter(1) in the max-function below:

I did run the code with and without this argument and in both times it returned the same string.

Thank you.

def classify(datapoint, tree):
  if isinstance(tree, Leaf):
    return max(tree.labels.items(), key=operator.itemgetter(1))[0]
  #select correct branch:
  value = datapoint[tree.feature]
  for branch in tree.branches:
    if branch.value == value:
      return classify(datapoint, branch)

Will retrieve the second item in a tuple or list.

1 Like

I don’t know anything about the example you cite, but as you probably know, the max() function returns the maximum (may be difficult to define depending on the object type) from a sequence.

print(max([4,8,2]))
# 8

If the items are themselves sequences, you may want to find the max based on a sub-sequence.

Say we have a list of tuples of students and their test scores:
scores = [('Martin', 88), ('Paul', 76), ('Germaine', 92)]

If you just call max(scores), you will get ('Paul', 76), since by default max searches on the first item in a tuple and orthographically, ‘P’ is greater than ‘M’ or ‘G’.

So, here is where key = comes into play. If you Google how to do this, you will nearly always see an example using a lambda function:

print(max(scores, key = lambda x: x[1]))
# ('Germaine', 92)

So the key = parameter (this also for sort() and sorted()), says something like:

Go through the iterable we are looking at, and, for each item, execute the function on the right side of key =, then use the returned value from that function (rather than the default first element in the item) to perform whatever we are being asked here - max(), min(), sort(), etc. - then return the “winning” item if it is max() or min(), or the list of items if sort().

Well, for some reason, and apparently without explanation (?), the authors of the exercise chose to import the operator module, and from it apply the.itemgetter() method to do exactly the same thing, namely examine a certain value from each tuple (or sub-list, or whatever.)

So, since apparently we are dealing with a dictionary here, let’s put the scores into one:

import operator
my_students = {'Martin': (88), 'Paul': (76 ), 'Germaine':(92)}
print(my_students.items())
print(max(my_students.items(), key = lambda x: x[1]))
print(max(my_students.items(), key = operator.itemgetter(1)))

Output:

dict_items([('Martin', 88), ('Paul', 76), ('Germaine', 92)])
('Germaine', 92)
('Germaine', 92)

(Thanks! Your question gave me the opportunity to look up .itemgetter(), something I’d not yet encountered.)


Of course, that wasn’t your question, was it? You wanted to know why you got the same result with or with the key = parameter. Take a look at the dictionary being searched. By coincidence, I’d guess, the “default” max() returns the same tuple as the max() based on the actual value of interest.

1 Like

patrickd314, thank you so much for comprehensive answer.

I understood:

  1. that key helps to apply function not to the 1-st element in the sub-list/tuple/array.
  2. I have checked the code one more time and found out, that output of the code is the single key:value pair: Counter({'unacc': 174}). As far as I’m aware, this is why I didn’t see the differencies between (tree.labels.items(), key=operator.itemgetter(1))[0] and (tree.labels.items())[0]. If there were more elements to compare (e.g. 2), I would have seen the difference.
1 Like