How does taking a larger number of samples solve the issue of a skewed sample mean? How does the Central Limit Theorem help here?
The Central Limit Theorem (CLT) is, roughly, the following statement
Regardless of the distribution of our data, if we take a large number of samples of a fixed size and plot the sample statistic which we care about (e.g. mean or standard deviation) the distribution of the resulting plot will be roughly normal, i.e. a bell curve.
Note: The distribution that we get from plotting our sample statistics is called a sampling distribution.
We can see the truth of this claim, experimentally, by playing with the applet in this lesson.
Okay. So how does this help us solve the issue of a skewed sample? The CLT helps because it turns out that the mean of our sampling distribution will become arbitrarily close to the mean of our original distribution as we take more and more samples. This is great because we often never know the mean of the original distribution. The CLT gives us a mathematical assurance that we can calculate it from taking samples (which we can directly calculate the mean for).
In conclusion, if we have a skewed sample mean, by
- taking a larger number of samples,
- plotting the mean of each sample, and
- taking the mean, call it
M, of the resulting distribution
M is likely to be close to the population mean by the Central Limit Theorem.