In the context of this exercise, what are some reasons that we square the differences?

Answer

There are several reasons for squaring the differences, some of which are based on more complex mathematical reasons and which will not be covered here, but the following are a few important ones.

One reason, as explained in the exercise text, is that we want to make sure that both positive and negative differences contribute to the loss in the same way. For example, with the following losses, the squares are both positive, and so both count the same amount to the total loss.

Another reason for squaring the differences is to â€śpenalizeâ€ť the larger errors more. As a consequence, outliers in the datasets will have a strong effect on the total loss, because of their larger differences than the rest of the data.

If removing the negative sign is the ONLY reason why we square it, then why not use the absolute value function instead?

In other words, if we square it instead of taking its absolute value, then points that are farther away from the line will be penalized more strongly. Is this intended?

The absolute value of negative one squared is 1, as is negative 1 cubed, and so on. Always unity. But we would never fudge away a negative sign. Not ever.

But doesnâ€™t @yeezy_boii 's point still stand? If the only purpose were to remove the negative, whatâ€™s wrong with absolute value? Or if you prefer take the square root of the squareâ€¦

Yes, itâ€™s explicitly stated both in the exercise and in this post that removing the sign of the loss is the desired outcome.

@yeezy_boii 's question is, if the only reason why is to remove the negative sign, then why square rather than take the absolute value. The answer to that question is, removing the negative sign is not the only reason - the original post also mentions penalizing outliers more by squaring the loss.