FAQ: Statistical Concepts - Central Limit Theorem

This community-built FAQ covers the “Central Limit Theorem” exercise from the lesson “Statistical Concepts”.

Paths and Courses
This exercise can be found in the following Codecademy content:

Data Science

FAQs on the exercise Central Limit Theorem

Join the Discussion. Help a fellow learner on their journey.

Ask or answer a question about this exercise by clicking reply (reply) below!

Agree with a comment or answer? Like (like) to up-vote the contribution!

Need broader help or resources? Head here.

Looking for motivation to keep learning? Join our wider discussions.

Learn more about how to use this guide.

Found a bug? Report it!

Have a question about your account or billing? Reach out to our customer support team!

None of the above? Find out where to ask other questions here!

Puzzled by this result - I want to test significance between the mean of samples and the true population.

Null hypothesis is samples are NOT representative if there is propability below 5% that sample mean != true_mean of population

I get strange results with p_values, all more than 0.05 , even for the smallest sample.

Can you help understadning why or how to interpret a the qualitative meaning of “extreme small sample” if a t_test yields such result?

import numpy as np

# Create population and find population mean
population = np.random.normal(loc=65, scale=100, size=3000)
population_mean = np.mean(population)

# Select increasingly larger samples
extra_small_sample = population[:10]
small_sample = population[:50]
medium_sample = population[:100]
large_sample = population[:500]
extra_large_sample = population[:1000]

# Calculate the mean of those samples
extra_small_sample_mean = np.mean( extra_small_sample)
small_sample_mean = np.mean( small_sample)
medium_sample_mean = np.mean( medium_sample)
large_sample_mean = np.mean( large_sample)
extra_large_sample_mean = np.mean( extra_large_sample)

# Print them all out!
print "Extra Small Sample Mean: {}".format(extra_small_sample_mean)
print "Small Sample Mean: {}".format(small_sample_mean)
print "Medium Sample Mean: {}".format(medium_sample_mean)
print "Large Sample Mean: {}".format(large_sample_mean)
print "Extra Large Sample Mean: {}".format(extra_large_sample_mean)

print "\nPopulation Mean: {}".format(population_mean)

from scipy import stats

print( stats.ttest_1samp(extra_small_sample,population_mean))

print( stats.ttest_1samp(extra_large_sample, population_mean))

Isn’t the null hypothesis confused with the alternative hypothesis?

print( stats.ttest_1samp(extra_small_sample,population_mean))

print( stats.ttest_1samp(extra_large_sample, population_mean))

The null hypotheses examined with the above code is:

The set of samples (extra_small_sample or extra_small_sample) belongs to a population with the mean population_mean.

If the p-value is over the significance level (0.05), it means that this null hypothesis cannot be rejected.

Hey, I’ve been going through the data scientist path and currently working on the central limit theorem.
Here’s the link to the page

There is a line which isn’t making sense to me when I go through the heading ‘How does this help the data scientist?’.

Once we calculate the standard error as part of the CLT, why do we have to multiply it with 1.275? Then, the next line multiplies 1.275 with 1.96. It is interesting to mention here that 1.275 is the standard error given. Refer the screenshot below:

Can someone please explain to me what’s going on here?

1 Like

First, the data scientist needs to multiply 1.275 by the estimated standard error:

I think 1.275 here is a typo for 1.96.

1.96 (see this Wikipedia page) is the 97.5 percentile point of the standard normal distribution (it is, normal distribution with mean 0 and standard deviation 1). If a random variable follows the standard normal distribution, there is about 95% probability of taking a value between -1.96 and 1.96.

Our data scientist has calculated a standard error of 1.275. It means that the estimated sampling distribution is the normal distribution with the standard deviation of 1.275. It will have the shape that is stretched the standard normal distribution 1.275 times horizontally. So there is about 95% probability that a sample mean will be in the range plus-minus 1.275 * 1.96 of the population mean.

1 Like

I thought that this was a typo but to be sure, I wanted to run it by someone first.
Awesome! I get it now. The wikipedia page clears out a lot of confusion. Thanks for helping out, @object2161442840!