# FAQ: Decision Trees - Classifying New Data

This community-built FAQ covers the “Classifying New Data” exercise from the lesson “Decision Trees”.

Paths and Courses
This exercise can be found in the following Codecademy content:

## FAQs on the exercise Classifying New Data

There are currently no frequently asked questions associated with this exercise – that’s where you come in! You can contribute to this section by offering your own questions, answers, or clarifications on this exercise. Ask or answer a question by clicking reply () below.

If you’ve had an “aha” moment about the concepts, formatting, syntax, or anything else with this exercise, consider sharing those insights! Teaching others and answering their questions is one of the best ways to learn and stay sharp.

## Join the Discussion. Help a fellow learner on their journey.

Agree with a comment or answer? Like () to up-vote the contribution!

Found a bug? Report it!

Have a question about your account or billing? Reach out to our customer support team!

None of the above? Find out where to ask other questions here!

1 Like

Would you, please, explain the meaning of the function: “if isinstance(tree, Leaf):
return max(tree.labels.items()”.
To understand the function better, is important to know the content of the Data. But I can’t read neither Leaf, nor tree.labels.items() via the standard ‘print’ function. How can I read it?

Thank you.

1 Like

`isinstance(tree, Leaf)`

The above tests if the `tree` object is an instance of the `Leaf` class.

``````class Leaf:
def __init__(self):
pass

tree = Leaf()

print (isinstance(tree, Leaf))    # True
``````

To be able to read the data, we need to know all the attributes. if the class has a `__repr__` or `__str__` method, we can `print` it.

`````` def __repr__(self):
return "Attribute: {value}".format(value=self.attribute)

print (tree)    # Attribute: %value%
``````

Please show us your `Leaf` class, as well as the instantiation of `tree`.

2 Likes

Would you, pleace, tell, why do we use the argument `key=operator.itemgetter(1)` in the max-function below:

I did run the code with and without this argument and in both times it returned the same string.

Thank you.

``````def classify(datapoint, tree):
if isinstance(tree, Leaf):
return max(tree.labels.items(), key=operator.itemgetter(1))[0]
#select correct branch:
value = datapoint[tree.feature]
for branch in tree.branches:
if branch.value == value:
return classify(datapoint, branch)
``````

Will retrieve the second item in a tuple or list.

1 Like

I don’t know anything about the example you cite, but as you probably know, the max() function returns the maximum (may be difficult to define depending on the object type) from a sequence.

``````print(max([4,8,2]))
# 8
``````

If the items are themselves sequences, you may want to find the max based on a sub-sequence.

Say we have a list of tuples of students and their test scores:
`scores = [('Martin', 88), ('Paul', 76), ('Germaine', 92)]`

If you just call max(scores), you will get `('Paul', 76)`, since by default max searches on the first item in a tuple and orthographically, ‘P’ is greater than ‘M’ or ‘G’.

So, here is where `key =` comes into play. If you Google how to do this, you will nearly always see an example using a lambda function:

``````print(max(scores, key = lambda x: x[1]))
# ('Germaine', 92)
``````

So the `key =` parameter (this also for sort() and sorted()), says something like:

Go through the iterable we are looking at, and, for each item, execute the function on the right side of key =, then use the returned value from that function (rather than the default first element in the item) to perform whatever we are being asked here - max(), min(), sort(), etc. - then return the “winning” item if it is max() or min(), or the list of items if sort().

Well, for some reason, and apparently without explanation (?), the authors of the exercise chose to import the operator module, and from it apply the.itemgetter() method to do exactly the same thing, namely examine a certain value from each tuple (or sub-list, or whatever.)

So, since apparently we are dealing with a dictionary here, let’s put the scores into one:

``````import operator
my_students = {'Martin': (88), 'Paul': (76 ), 'Germaine':(92)}
print(my_students.items())
print(max(my_students.items(), key = lambda x: x[1]))
print(max(my_students.items(), key = operator.itemgetter(1)))
``````

Output:

``````dict_items([('Martin', 88), ('Paul', 76), ('Germaine', 92)])
('Germaine', 92)
('Germaine', 92)
``````

(Thanks! Your question gave me the opportunity to look up .itemgetter(), something I’d not yet encountered.)

Of course, that wasn’t your question, was it? You wanted to know why you got the same result with or with the key = parameter. Take a look at the dictionary being searched. By coincidence, I’d guess, the “default” max() returns the same tuple as the max() based on the actual value of interest.

5 Likes

patrickd314, thank you so much for comprehensive answer.

I understood:

1. that key helps to apply function not to the 1-st element in the sub-list/tuple/array.
2. I have checked the code one more time and found out, that output of the code is the single key:value pair: `Counter({'unacc': 174})`. As far as I’m aware, this is why I didn’t see the differencies between `(tree.labels.items(), key=operator.itemgetter(1))[0]` and `(tree.labels.items())[0]`. If there were more elements to compare (e.g. 2), I would have seen the difference.
3 Likes

how did you find the output of the code Counter({‘unacc’: 174})?

how do you find the whole list or tuple if you can’t print the tree.labels.items()? Thanks2

Sorry, I’ve not completed this unit but will attempt in the meantime to catch up to where you are, time allowing.

1 Like

Why is there only one best feature? Shouldn’t there been a sequence of the best features to determine the trees branching?

How internal node can handle multiple features??? For example we can have 2 internal nodes after the first nodeand after the 2 internal nodes 4 leafs. thanks you:)