FAQ: Logistic Regression - Classification Thresholding

This community-built FAQ covers the “Classification Thresholding” exercise from the lesson “Logistic Regression”.

Paths and Courses
This exercise can be found in the following Codecademy content:

Machine Learning

FAQs on the exercise Classification Thresholding

There are currently no frequently asked questions associated with this exercise – that’s where you come in! You can contribute to this section by offering your own questions, answers, or clarifications on this exercise. Ask or answer a question by clicking reply (reply) below.

If you’ve had an “aha” moment about the concepts, formatting, syntax, or anything else with this exercise, consider sharing those insights! Teaching others and answering their questions is one of the best ways to learn and stay sharp.

Join the Discussion. Help a fellow learner on their journey.

Ask or answer a question about this exercise by clicking reply (reply) below!

Agree with a comment or answer? Like (like) to up-vote the contribution!

Need broader help or resources? Head here.

Looking for motivation to keep learning? Join our wider discussions.

Learn more about how to use this guide.

Found a bug? Report it!

Have a question about your account or billing? Reach out to our customer support team!

None of the above? Find out where to ask other questions here!

Any chance of calculating the probabilities with a list comprehension with an if/else statement? Something like this:

probabilities = [1 if probability > threshold else 0 for probability in probabilities]
3 Likes

Yes, I was also wondering why that is “wrong”. The resulting values are all correct, but the exercise doesn’t state what shape and container the resulting list should have. I solved it by starting with a np.zeros(len(probabilities), dtype=int) and finally applying .reshape(-1,1) to get the right shape. The list comprehension looks much more intuitive and pythonic. So the exercise amounts to guessing whatever the “solution” happens to use. I think Codecademy’s evaluator should be more permissive in comparing the results.

1 Like

There is a good reason why only numpy arrays are accepted in the answer:

For big arrays, which we are likely to have with machine learning, np.where() is literally a hundred times faster than a list comprehension. Generally speaking, using big arrays, you should use numpy functions whenever possible.

However, I agree that it could be more clear in the instructions which format is accepted for an answer.

How did we get:
calculated_coefficients = np.array([[0.20678491]])
intercept = np.array([-1.76125712])

How do we get the inputs above?

I guess they are going to show us in the next excercises. However I think that the previously used some form of gradient descent to calculte both and then they just plugged it.

In the context of this lesson. In both graphs, why the stripped areas are incorrect predictions?

Could any member recommend any books to learn the details of this Logistic regression method? I feel I’m not grasping the concept. I know what the code does, but I don’t fully understand why are we doing the method and how it works.

For the lesson “Logistic Regression”, setting the variable “alternative_threshold” doesn’t change the prediction’s result. Whatever number I set for it, the result is always [0 1 0 1 1]. Am I missing something?
By changing the “alternative_threshold” value, I was expecting the predictions to become [0 1 0 0 1] from [0 1 0 1 1].
Following is a copy of the code.

# Pick an alternative threshold here:
alternative_threshold = 0.6

# Import pandas and the data
import pandas as pd
codecademyU = pd.read_csv('codecademyU_2.csv')

# Separate out X and y
X = codecademyU[['hours_studied', 'practice_test']]
y = codecademyU.passed_exam

# Transform X
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X)
X = scaler.transform(X)

# Split data into training and testing sets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state = 27)

# Create and fit the logistic regression model here:
from sklearn.linear_model import LogisticRegression
cc_lr = LogisticRegression()
cc_lr.fit(X_train,y_train)

print("coefs:",cc_lr.coef_)
print()
print("intercept:",cc_lr.intercept_)
print()


# Print out the predicted outcomes for the test data
print("predictions:",cc_lr.predict(X_test))

# Print out the predicted probabilities for the test data
print("predictions probabilities:",cc_lr.predict_proba(X_test)[:,1])

# Print out the true outcomes for the test data
print()
print("y_test:")
print(y_test)





And this is on the console:

coefs: [[1.5100409  0.12002228]]

intercept: [-0.13173123]

predictions: [0 1 0 1 1]
predictions probabilities: [0.32065927 0.7931881  0.05547483 0.57747928 0.87070434]

y_test:
7     0
15    1
0     0
11    0
17    1
Name: passed_exam, dtype: int64

alternative_threshold seems to be a global variable that is not passed into any method. Why does this variable affect the predict method in this exercise? It is never referenced again in the code.

I actually made myself the same remark. The visualization does not make sense. In the intersection, it is not possible to distinguish the set of people who have cancer from the one who don’t.
I think they did not choose an appropriate visualization for this lesson.

Same problem here. This lesson definitely has not been reviewed toroughly by their author…

I guess the variable is just there for you to input your answer to the question. It does not affect the behavior of the code at all.