Hi guys,
There is something I haven’t understood about the exercise: in the step_gradient function, the next guesses are defined by b=b_current - 0.01*b_gradient. As far as I know, the gradient is a slope, so how does an x-axis value minus a slope produce another x-axis value? Thanks for your reply!
Generally, gradient represents the direction in which the function increases fastest. By adding to (b, m) a vector in the direction opposite to the gradient, we can expect that the function (loss) will be smaller. (However, if the step is too large, it may go past the point of minimizing the loss, so it is adjusted by the learning rate.)