FAQ: Decision Trees - Gini Impurity

This community-built FAQ covers the “Gini Impurity” exercise from the lesson “Decision Trees”.

Paths and Courses
This exercise can be found in the following Codecademy content:

Data Science

Machine Learning

FAQs on the exercise Gini Impurity

There are currently no frequently asked questions associated with this exercise – that’s where you come in! You can contribute to this section by offering your own questions, answers, or clarifications on this exercise. Ask or answer a question by clicking reply (reply) below.

If you’ve had an “aha” moment about the concepts, formatting, syntax, or anything else with this exercise, consider sharing those insights! Teaching others and answering their questions is one of the best ways to learn and stay sharp.

Join the Discussion. Help a fellow learner on their journey.

Ask or answer a question about this exercise by clicking reply (reply) below!

Agree with a comment or answer? Like (like) to up-vote the contribution!

Need broader help or resources? Head here.

Looking for motivation to keep learning? Join our wider discussions.

Learn more about how to use this guide.

Found a bug? Report it!

Have a question about your account or billing? Reach out to our customer support team!

None of the above? Find out where to ask other questions here!

I’m relatively new to Python and programming with objects in general, but I don’t understand how,

for label in label_counts:
probability_of_label = label_counts[label]/len(labels)

loops through the counts stored in label_counts. I assume its something about the object created by the Counter function, but it total came out of the blue, and totally stumped me. How does it know the index in the for loop is referencing the counts?

1 Like

how do you do the gini impurity

:slightly_frowning_face: :slightly_frowning_face: :slightly_frowning_face: :slightly_frowning_face: :slightly_frowning_face:

1 Like

This module does not make sense, first it’s start explaining a classifier for how expensive a car is and then it jumps to gini impurity, without give a proper introduction??

Sorry but the machine learning part of the data science path really is poorly done compared to the other modules.

ok pretty freakin stumped here. Managed to do the exercice just fine. But when it gets to the function part… wtf. I keep getting 0.88 as a result for the first labels. I’ve printed my variables within the function. my impurity is 1, my probability_of_label is 0.33. Somehow it returns 0.88 still.

edit: nvm figured it out

I have the same question. Did you get the answer?

I’ll try and break it down a bit, hopefully that will help. You have label_counts which is a Counter type. A Counter is similar to a dictionary but each key is a record of how many times that key appeared (this would be like the multiplicity of a multiset). Have a look at the docs for a description and some short examples of the Counter type if you like- collections — Container datatypes — Python 3.10.0 documentation

Upon accessing a specific label with the subscript syntax label_counts[label] you can access how many times that specific label appeared. So for example label_counts['acc'] == 2.

Then it’s a case of probability, e.g. if I have a bag of seven marbles and three are blue, when extracting a single marble randomly from the bag what is the probability of extracting a blue marble: num_of_blue / total_marbles is 3/7.

For a quick rewrite of the example perhaps some additional and altered naming helps clarify how this works-

total_label_count = len(labels)

for label in label_counts:
    this_label_count = label_counts[label]
    probability_of_label = this_label_count / total_label_count
1 Like

Thank you very much for your time! This is really helpful!

1 Like