In the context of this exercise, how is the equation for gradient descent of the slope different from the gradient descent of the intercept?
The main difference between the two is that for the gradient descent for slope, we have an additional factor of
x_i for each point of the graph.
The reason for this is due to the derivation of the error function, from which we obtain both gradient descents.
The error function, which we sought to minimize, was as follows:
To obtain the gradient descent for the intercept, we calculate the partial derivative of the error function with respect to
b, resulting in:
To obtain the gradient descent for the slope, we calculate the partial derivative of the error function with respect to
m instead, resulting in:
which has the additional factor of
x_i at each point.