FAQ: Linear Regression - Gradient Descent for Intercept

This community-built FAQ covers the “Gradient Descent for Intercept” exercise from the lesson “Linear Regression”.

Paths and Courses
This exercise can be found in the following Codecademy content:

Data Science

Machine Learning

FAQs on the exercise Gradient Descent for Intercept

Join the Discussion. Help a fellow learner on their journey.

Ask or answer a question about this exercise by clicking reply (reply) below!

Agree with a comment or answer? Like (like) to up-vote the contribution!

Need broader help or resources? Head here.

Looking for motivation to keep learning? Join our wider discussions.

Learn more about how to use this guide.

Found a bug? Report it!

Have a question about your account or billing? Reach out to our customer support team!

None of the above? Find out where to ask other questions here!

" It is not crucial to understand how we arrive at the gradient equation. "
ok now I’m curious: how did we arrive at this gradient equation? :joy:

just wondering :slight_smile:


Same here. It would be nice to have a link or reference to dig a bit deeper.


Hope you know your derivation, enjoy.

1 Like

I am curious as to why we calculate the difference as diff * -2/N
why wouldn’t we calculate it as simply diff/N?

1 Like

Me too, I’m curious why we have to multiply the difference to -2/N. Can anyone answer this? Thanks. :slight_smile:


On step 2, why do we have len(x) to define the range of the loop ? I don’t understand the use of the function len() with simple input variables.

On step 3, it’s mentioned that x is an object, could it be the reason why we are allowed to use len(x) since the x variable is an object not an integer ?

Here we are assuming that x is an object such as a list or an array. In this case len(x) returns the number of elements in the list or array x. In general, the len() function can be applied to a wide variety of objects including string, dictionary, etc.

sorry, I am very new to this: what determines the current gradient guess and current intercept guess?
like why are they both zero later in the code, and should they always be zero?
thank you.

Yet another question about -2/N. Why we multiply it by -2?

The line till which i can understand is that we have a sum of losses from y - (m*x+b) and if we divide it by N(number of values that we added as sum) we will get average loss.

So what does the multiplication by -2 and then division by N do?

Thank you!

2 comes from derivative. f(x) = x^2 f’(x) = 2x. This is gradient, it points the max y in our graph, but we need gradient descent, the min point of our loss function. - change the direction of gradient at 180 degrees.