7/7 Code works fine but don't understand why **2


#1

Hi so my code is not the issue but the instructions:

number 4: for each score in scores: Compute its squared difference: (average - score) ** 2 and add that to variance.

Why squared, yes this is probably a basic procedure in maths but would be glad if someone can point it out, or perhaps it is in the next lesson thanks!


#2

Hi @scriptmaster83175 ,

That's a good question to ask. When doing math, one should go beyond merely following the rules or the formula in order to consider why a particular technique is used.

The instruction to which you refer states:

04. for each score in scores: Compute its squared difference: (average - score) ** 2 and add that to variance.

Variance is a measure of the degree to which a set of numbers is spread out. Note that there are numerous other ways of measuring the variation among data values. When we use variance, we square the difference between each value and the mean in order to place special emphasis on the values that differ the most from the mean. It is a way of assuring that if there are a few values that differ markedly from the mean, that fact is represented well in the result.

As an alternative method for representing the degree to which data is spread out, we could compute a sum of absolute differences, instead of summing the squares of the differences. This would place less emphasis on the values that differ markedly from the mean. If, instead, we wish to place a greater emphasis on the outlying values, we could use the sum of the absolute values of the cubes of the differences. But variance is used more often than these other techniques.

Note that it is important to use absolute values of differences, rather than merely adding up the differences, because we don't want the positive and negative differences to cancel each other out. Since squares of real numbers are always positive, we don't have to worry about this when we use variance.


#3

Hi, thanks for the explanation! I also went and looked online a bit and the square part in basic terms is just so that a negative number won't effect the outcome (correct me if I am wrong though). The square root you apply to retrieve the correct amount. I am simplifying it maybe to far but that is how I understand it.

These links are also a excellent way to explain exactly why:
http://www.mathsisfun.com/data/standard-deviation.html

Then we can apply this to points on a 2D grid or 3D grid which is what I am after so excellent insights :smile:
http://www.mathsisfun.com/algebra/distance-2-points.html


#4

Hi @scriptmaster83175 ,

As you have noted, an important reason for squaring the differences is so that negative values do not cancel out positive values when the overall deviation of the data values from the mean is computed. Adding the squares of the differences of the mean does address that concern. Of course, this could also be addressed by using the absolute values of the differences, without squaring them. Using the squares of the differences goes beyond this concern in that it places additional weight on the larger deviations from the mean. For example, a difference of 1.0 or -1.0 units from the mean would add 1.0 to the running total, when squared. But a difference of 2.0 units would add 4.0 to the running total after squaring, as would a difference of -4.0. A difference of 10.0 or -10.0 from the mean would add 100.0 units to that total. Accordingly, as the degree to which data values differ from the mean increases, using the squares of these differences enables these outlying values to have a larger influence over the variance or standard deviation than the smaller deviations have.