There are currently no frequently asked questions associated with this exercise – that’s where you come in! You can contribute to this section by offering your own questions, answers, or clarifications on this exercise. Ask or answer a question by clicking reply () below.
If you’ve had an “aha” moment about the concepts, formatting, syntax, or anything else with this exercise, consider sharing those insights! Teaching others and answering their questions is one of the best ways to learn and stay sharp.
Join the Discussion. Help a fellow learner on their journey.
Ask or answer a question about this exercise by clicking reply () below!
Agree with a comment or answer? Like () to up-vote the contribution!
I’m relatively new to Python and programming with objects in general, but I don’t understand how,
for label in label_counts:
probability_of_label = label_counts[label]/len(labels)
loops through the counts stored in label_counts. I assume its something about the object created by the Counter function, but it total came out of the blue, and totally stumped me. How does it know the index in the for loop is referencing the counts?
This module does not make sense, first it’s start explaining a classifier for how expensive a car is and then it jumps to gini impurity, without give a proper introduction??
Sorry but the machine learning part of the data science path really is poorly done compared to the other modules.
ok pretty freakin stumped here. Managed to do the exercice just fine. But when it gets to the function part… wtf. I keep getting 0.88 as a result for the first labels. I’ve printed my variables within the function. my impurity is 1, my probability_of_label is 0.33. Somehow it returns 0.88 still.
I’ll try and break it down a bit, hopefully that will help. You have label_counts which is a Counter type. A Counter is similar to a dictionary but each key is a record of how many times that key appeared (this would be like the multiplicity of a multiset). Have a look at the docs for a description and some short examples of the Counter type if you like- collections — Container datatypes — Python 3.10.0 documentation
Upon accessing a specific label with the subscript syntax label_counts[label] you can access how many times that specific label appeared. So for example label_counts['acc'] == 2.
Then it’s a case of probability, e.g. if I have a bag of seven marbles and three are blue, when extracting a single marble randomly from the bag what is the probability of extracting a blue marble: num_of_blue / total_marbles is 3/7.
For a quick rewrite of the example perhaps some additional and altered naming helps clarify how this works-
total_label_count = len(labels)
for label in label_counts:
this_label_count = label_counts[label]
probability_of_label = this_label_count / total_label_count