FAQ: Accuracy, Recall, Precision, and F1 Score - Review

This community-built FAQ covers the “Review” exercise from the lesson “Accuracy, Recall, Precision, and F1 Score”.

Paths and Courses
This exercise can be found in the following Codecademy content:

FAQs on the exercise Review

There are currently no frequently asked questions associated with this exercise – that’s where you come in! You can contribute to this section by offering your own questions, answers, or clarifications on this exercise. Ask or answer a question by clicking reply (reply) below.

If you’ve had an “aha” moment about the concepts, formatting, syntax, or anything else with this exercise, consider sharing those insights! Teaching others and answering their questions is one of the best ways to learn and stay sharp.

Join the Discussion. Help a fellow learner on their journey.

Ask or answer a question about this exercise by clicking reply (reply) below!

Agree with a comment or answer? Like (like) to up-vote the contribution!

Need broader help or resources? Head here.

Looking for motivation to keep learning? Join our wider discussions.

Learn more about how to use this guide.

Found a bug? Report it!

Have a question about your account or billing? Reach out to our customer support team!

None of the above? Find out where to ask other questions here!

Hey,

I have a query on which value is used when choosing the best suitable K for a KNN: In the Breast Cancer Classifier project (link below) we had to plot the scores if different K-values on a plot using the .score() method which was called on a fitted KNN function (apologies if the terminology is all over the place here).

(https://www.codecademy.com/paths/data-science/tracks/dspath-supervised/modules/dspath-classification/projects/knn-project)

Based on this, I iterated through a list of K-values and calculated the score for each one, then passing the highest in a variable through and if/else statement, which I then could pass on as the K-value to be used in the predictions (training 0.8 and validation 0.2):

highest_score = 0
k_list = range(1, 101)

for k in k_list:
  classifier = KNeighborsClassifier(n_neighbors = k)
  classifier.fit(training_data, training_labels)  

  if highest_score < classifier.score(validation_data, validation_labels):
    highest_score = classifier.score(validation_data, validation_labels)
    best_k = k

classifier = KNeighborsClassifier(n_neighbors = best_k)
classifier.fit(training_data, training_labels)
guesses = classifier.predict(validation_data)

Now iterating over guesses vs. validation data would show 4 out of 114 misses, i.e. 3.51%. Based on the earlier iteration, K23 in this case, had the best score, namely: 0.9649122807017544, which is obviously the 96.49% correct (110/114).

Concluding, the type of score that we used when choosing this k-value was based on the ‘accuracy’ discussed in this class. Given recall, precision, and F1, are there other ways to determine K-values using similar methods as the .score() method?

Thanks,
-Twan

1 Like

How can it be that recall and precision should always move opposite to each other?

Using the lesson’s own example, if there were 10 snow days in a year, and I called them, and only them, as snow days that year, then both my recall (correct snow day calls out of all snow days) as well as my precision (correct snow day calls out of all snow day calls) would be 100%.

Plug 100% into each input in the F1 score equation and it gives us a perfect score.

Surely that bit in the lesson is a mistake… ?

If there are no false classifications (false negatives and false positives) at all, then recall and precision will certainly be 100%, but in many cases there are false classifications.

For example, if you adjust the model to make it easier for days to be classified as snow to improve recall, false positives are likely to increase. This will result in a lower precision.

On the other hand, if you adjust the model to prevent mistakenly classifying days as snow to improve precision, then false negatives are likely to increase. This will result in a lower recall.

I haven’t yet learned if there are other ways to determine the K-value, but I’ve tried it and found some interesting findings, so I’d like to share it. In the linked project, I tried using accuracy_score, recall_score, precision_score, and f1_score imported from sklearn.metrics instead of the .score() method. Using accuracy_score is to confirm whether the statistic calculated by the .score() method matches the accuracy here. It is confirmed that they actually matches.

accuracies = []
accuracy_scores = []
recalls = []
precisions = []
f1_scores = []
for k in range(1, 101):
  classifier = KNeighborsClassifier(n_neighbors=k)
  classifier.fit(training_data, training_labels)

  accuracies.append(classifier.score(validation_data, validation_labels))

  guesses = classifier.predict(validation_data)

  accuracy_scores.append(accuracy_score(validation_labels, guesses))
  recalls.append(recall_score(validation_labels, guesses))
  precisions.append(precision_score(validation_labels, guesses))
  f1_scores.append(f1_score(validation_labels, guesses))

print(accuracies == accuracy_scores)  # True
max_accuracy = max(accuracies)
best_accuracy_k = accuracies.index(max_accuracy) + 1
print(best_accuracy_k)  # 23
print(max_accuracy)  # 0.9649122807017544

max_recall = max(recalls)
best_recall_k = recalls.index(max_recall) + 1
print(best_recall_k)  # 49
print(max_recall)  # 0.9846153846153847

max_precision = max(precisions)
best_precision_k = precisions.index(max_precision) + 1
print(best_precision_k)  # 23
print(max_precision)  # 0.9841269841269841

max_f1_score = max(f1_scores)
best_f1_k = f1_scores.index(max_f1_score) + 1
print(best_f1_k)  # 56
print(max_f1_score)  # 0.9696969696969696

I also made graphs for each. Recalls and precisions are, as expected, behaving oppositely each other. The behavior of accuracies and F1 scores are very similar. But they are not exactly the same. The F1 score seems to be slightly higher at k = 56 than at k = 23. However, even at k = 23, the F1 score is close to the maximum value, so in this case it may be concluded that k = 23 is the best overall. Perhaps there is another option only if we have a strong reason to improve recall.

1 Like

Very good analysis, thank you!

1 Like