FAQ: Logistic Regression - Log-Odds

This community-built FAQ covers the “Log-Odds” exercise from the lesson “Logistic Regression”.

Paths and Courses
This exercise can be found in the following Codecademy content:

Machine Learning

FAQs on the exercise Log-Odds

There are currently no frequently asked questions associated with this exercise – that’s where you come in! You can contribute to this section by offering your own questions, answers, or clarifications on this exercise. Ask or answer a question by clicking reply (reply) below.

If you’ve had an “aha” moment about the concepts, formatting, syntax, or anything else with this exercise, consider sharing those insights! Teaching others and answering their questions is one of the best ways to learn and stay sharp.

Join the Discussion. Help a fellow learner on their journey.

Ask or answer a question about this exercise by clicking reply (reply) below!

Agree with a comment or answer? Like (like) to up-vote the contribution!

Need broader help or resources? Head here.

Looking for motivation to keep learning? Join our wider discussions.

Learn more about how to use this guide.

Found a bug? Report it!

Have a question about your account or billing? Reach out to our customer support team!

None of the above? Find out where to ask other questions here!

In this lesson coefficients (more exactly, 1 coefficient) and an intercept are already given. How can we calculate them by ourselves?

8 Likes

How is the equation of z any different from the one we use in multiple regression? And why would it apply to the logarithm of the odds if it makes no use of the log function? Thankyou in advance

2 Likes

The coefficients just kind of magically appear out of nowhere here. What exactly do they represent?

3 Likes

z is just a different function here. In multiple linear regression, z is based on the normal distribution so probability will go down as you get further from the mean. In this case, as you get further away, the probability tends to either 1 or 0

As far as applying to the log of the odds, it’s just kind of how the math works to make a logistical function. Remember just as addition and subtraction and multiplication and division are essentially the same, so are logs and exponents.

Hi! can anyone explain to me how log(2.33) = 0.8…? which log base are we using? I’ve tried log base2, base10 and base 4 but i never got the value. Thanks in advance!

1 Like

Natural log; 2.718 something

1 Like

you’re right! that went over my head haha

I’m still having trouble seeing what makes them different. According to the lesson, the equation is the same (but subbing z for y). Mathematically, that should produce the same result (as a linear function).

Are they identical, and we just call it ‘log-odds’ because we’re eventually going to use it with the Sigmoid and log? If so, it seems misleading to call it ‘log-odds’ when we haven’t used any Sigmoid or log on it yet to give it a more log-like shape.

Just trying to understand, thanks in advance.

1 Like

Where did we take coefficients and intercepts? How did we select it ?

For our Logistic Regression model, however, we calculate the log-odds, represented by z below, by summing the product of each feature value by its respective coefficient and adding the intercept.

Its such as Ln(2.33)

Parentheses are sometimes added for clarity, giving ln( x ), log e ( x ), or log( x ). This is done particularly when the argument to the logarithm is not a single symbol, so as to prevent ambiguity.

can someone please explain how these coefficients are determined?

I have the same confusion.

Guys a lot of the questions here are about confusion regarding the coefficients. A response needs to be pinned to the top here addressing the question. Anyway, here’s the answer:

It’s explained later in the lesson in slide 9/11 that we start by computing log loss given coefficients and intercept equal to 0, and then through gradient descent, iteratively repeating the process while updating the values of the coefficients each time until we converge on the coefficients that give us the minimal log loss.

calculated_coefficients are coefficients that were produced from the result of doing this gradient descent iteration, and are introduced here so that we can see, in a way that makes sense, how all the function mappings occur.

Codecademy devs, perhaps this can be made clearer upfront to tell them not to worry about the calculated_coefficients, because later on it will be explained?

1 Like

Thanks for explaining the coefficients calculation here. Another confusion of mine is, given the coefficients, how comes the sum of multiplication of features and coefficients plus intercept becoming the so called ‘log_odds’? The calculation could still lead to negative number, and by the definition of log_odds (probability of positive divided by probability of negative), it could never below 0. Is there a way to prove the calculation of the prediction in Linear Regression lead to the concept of log_odds?
Very confused here. Many thanks in advance.

Sometimes I will say weights instead of coefficients.

From wikipedia: Logit - Wikipedia


The article says the log-odds function takes a function which ranges from 0 to 1, like a probability which only has values from 0 to 1, and maps it to a range (-infinity to +infinity). It is practically the inverse of mapping a function to a sigmoid. According to this definition the log-odds can be below 0 since -infinity is the lower limit of the range.

From the lesson:
image
and:
image

Right now, I don’t exactly know a mathematical way to compare the equation for log-odds using probability:

log( P(occuring) / P(not_occuring) )

and our log-odds which is:

The sum of product of features and their weights(coefficients) + intercept

but i can see how they both represent the likelihood of belonging to the positive class.
The summation takes all those features according to their weights and gives us a single metric representing the student’s chances, which seems to have a range (-infinity, +infinity), which later on in the lesson will be mapped to a range of (0,1) via the sigmoid.

PS
@micro1073766980 you said

but isn’t it supposed to be (probability of event occurring divided by probability of event not occuring)?

1 Like

Thank you for the prompt and detailed explanation!
I agree log_odds is probability of event occurring divided by probability of event not occuring, sorry for my wrong expression.
But still don’t understand, if range for p is (0,1), then wouldn’t p/(1-p) always be 0 or over? There should be no chance to produce a negative number from p/(1-p).

Plot of p/(1-p)
We will see that it ranges from negative to positive infinity.
We can think of it like this: When p, which has a range of (0, 1), is being plugged into p/(1-p), it is being mapped onto the function x/(1-x), which has a range of (-infinity, +infinity)

Also, interestingly, log(x/(1-x)) gives us an inverted sigmoid. I think this also has a range of (-infinity, +infinity), but only a domain of (0,1). The plot is not fully extending the line of the graph up to and down to infinity i think. The log-odds is practically an inverted sigmoid.

Ok, after some research, I think I got this figured out. I think. Haha. Here’s what seems to be the logic:

  1. We use a multiple linear regression (MLR) equation (y = m1x1 + m2x2 + … mnxn + b) because we want to be able to account for several features. But this equation will give us values of y that graph to a line. We don’t want that; since we’re dealing with a binary categorical variable, we only want to know if the features will yield one category (1) or the other (0).
  2. Enter the Sigmoid Function S(y) = 1 / (1 - e**-y) . If we run any real value of y through the Sigmoid Function, it will map the value to the range between 0 and 1. Just what we want for a probability.
  3. So, we want to take the features, find the y they produce, and express it as a probability. So really, we want to run the y of our MLR through the Sigmoid Function to express each y as a value between 0 and 1 (i.e. a probability)
  4. To do this in general, we can say the Sigmoid Function S(y) is our probability p, and express it as p = 1 / (1 - e**-y). Since we want to run our MLR y through this, we can solve this for y, and substitute it. If we solve for y, then S(y) or p becomes ln(p / (1 - p). But p / (1 - p) is what we call the ‘odds.’
  5. So putting it all together, we get ln(odds) = m1x1 + m2x2 + … + mnxn + b
  6. If we express ln(odds) as z, we get z = m1x2 + m2x2 + … + mnxn + b

So it looks just like the MLR equation, but the values it yields (y) are run through the Sigmoid Function to map them to values between 0 and 1 to represent a probability. Thus we have to represent the sigmoided y as something other than y (to show it’s different), namely the ‘log-odds’ or z.

If this explanation is wrong, I’m open to correction! Just trying to understand it myself.

1 Like

I guess from making LinearRegression model and after using method .fit() they could extract from model .coeff_ and .intercept_, but I can be wrong…