What are some reasons that we square the differences?



In the context of this exercise, what are some reasons that we square the differences?


There are several reasons for squaring the differences, some of which are based on more complex mathematical reasons and which will not be covered here, but the following are a few important ones.

One reason, as explained in the exercise text, is that we want to make sure that both positive and negative differences contribute to the loss in the same way. For example, with the following losses, the squares are both positive, and so both count the same amount to the total loss.

loss1 = -5
loss2 = 5

loss1**2 + loss2**2 = 25 + 25 = 50

Another reason for squaring the differences is to “penalize” the larger errors more. As a consequence, outliers in the datasets will have a strong effect on the total loss, because of their larger differences than the rest of the data.


Bear in mind that the square of a difference is actually a scalar. The only purpose in squaring is to remove the negative sign.