FAQ: Variance - Square The Differences

This community-built FAQ covers the “Square The Differences” exercise from the lesson “Variance”.

Paths and Courses
This exercise can be found in the following Codecademy content:

Learn Statistics With Python

FAQs on the exercise Square The Differences

There are currently no frequently asked questions associated with this exercise – that’s where you come in! You can contribute to this section by offering your own questions, answers, or clarifications on this exercise. Ask or answer a question by clicking reply (reply) below.

If you’ve had an “aha” moment about the concepts, formatting, syntax, or anything else with this exercise, consider sharing those insights! Teaching others and answering their questions is one of the best ways to learn and stay sharp.

Join the Discussion. Help a fellow learner on their journey.

Ask or answer a question about this exercise by clicking reply (reply) below!

Agree with a comment or answer? Like (like) to up-vote the contribution!

Need broader help or resources? Head here.

Looking for motivation to keep learning? Join our wider discussions.

Learn more about how to use this guide.

Found a bug? Report it!

Have a question about your account or billing? Reach out to our customer support team!

None of the above? Find out where to ask other questions here!

1 Like

Hello guys,
I we simply want to get rid of the negatives, why not take the absolute value instead of the squared value?

Intuitively that seems more accurate.

1 Like

I had the same doubt! This page here explains “Why Square?”: Standard Deviation and Variance (mathsisfun.com). In short, using absolute values (which results in calculating the “Mean Deviation” for the dataset) wouldn’t truly account for the difference in datasets such as these two:

set1 = [4, 4, -4, -4] # Less spread out
np.mean(set1) # Equals 0
absolute_values = [4, 4, 4, 4]
mean_deviation = (4 + 4 + 4 + 4) / 4 # Equals 4

set2 = [7, 1, -6, -2] # More spread out
np.mean(set2) # Equals 0
absolute_values = [7, 1, 6, 2]
mean_deviation = (7 + 1 + 6 + 2) / 4 # Equals 4
2 Likes

I think it should be pointed out that the sum of differences from the mean will always equal to zero. The lesson and exercise makes it seem like it’s almost by luck we have negative differences that we need to get rid of when in reality you will always end up with negative and positive differences that sum to zero due to the nature of arithmetic mean.

Mathematically it can be proven as:

The sum of the differences from the mean for each of the n data point Xi is

Since the mean is calculated as sum of the data points divided number of data points n, it can be simplified as followed: