 Question

In this exercise, we expect the population mean to be 30 but the mean of our sample is 31. So wouldn’t our null hypothesis be: the sample represents a population of mean 31?

We then go on and use test_1samp(ages, 30) to find the p-value. I am unclear about these steps.

What are we checking for in this hypothesis test? Are we checking If the sample represents a population of mean 30? If so, if it is less than 0.05 then does it mean it represents a population of mean 30?

First, let’s note that the null hypothesis is usually the status quo. If we expect that the population mean is 30, this is the status quo and this is why our null hypothesis is

The set of samples belongs to a population with the target mean of 30

By performing test_1samp(ages, 30), we are testing the likelihood that the samples that we have in ages were taken/drawn from a distribution with mean 30. We could of course have just gotten somewhat unlucky with our sampling in this case, especially since the number of samples for ages is small. If the resulting p-value is less than 0.05, we will reject the null hypothesis, meaning that we’re saying it is unlikely that the sample was drawn from a distribution with mean 30. A p-value greater than or equal to 0.05 means that we fail to reject the null hypothesis, meaning that we cannot be confident that the samples were not drawn from a distribution with mean 30.

6 Likes

I don’t find this example helpful I understand what a p-value is and the null hypothesis are ,but do not find the example of explantion helpful:grimacing: from scipy.stats import ttest_1samp
import numpy as np

correct_results = 0 # Start the counter at 0

daily_visitors = np.genfromtxt(“daily_visitors.csv”, delimiter=",")

for i in range(1000): # 1000 experiments
tstatistic, pval = ttest_1samp(daily_visitors[i], 30)
#print the pvalue here:
print(pval)

print “We correctly recognized that the distribution was different in " + str(correct_results) + " out of 1000 experiments.”
print “We correctly recognized that the distribution was different in " + str(correct_results) + " out of 1000 experiments.”

1 Like

Do you have a concrete question?

So basically rejecting the null hypothesis does not assert the fact that the sample was indeed taken from a distribution having mean 30?

So a p-value greater than 0.05 means that the sample is too similar to the proposed mean (of 30) to reject that it wasn’t taken from a distribution having the same mean (of 30).

A p-value less than 0.05 means that the sample is not similar enough to a mean of 30, and that it may very well have been taken from a distribution with a mean of 31.

Correct me if that’s off, please yea. that’s is the idea.
I think the best way to see the purpose of p-value is to visualize that point in a normal distribution bell curve. Think of Null hypothesis as the reference or base distribution curve. We set the significance level at 0.05 and then imagine it on that null bell curve, which will be 0.025 or 2.5% on both end of the bell curve. If your sample p-value is less than 0.05, lets say 0.03, and label it on the null bell curve, you will have 0.015 of 1.5% on both ends. At that point, in respect to the null mean, is a very low probability area. That means that the sample mean you have is far enough from the null mean in the null curve that your sample shows significant difference from the mean of the null; thus, we will reject the null hypothesis that the sample mean doesn’t have significant difference. If p-value is more than 0.05 and label that point on the null bell curve, it is in the area of high probability (maybe within 1std or 2std of the null mean, for example) the sample mean is close enough to the null mean; therefore, we shouldn’t reject the null and consider the sample mean is not much difference than the null mean.
I hope this helps!

@legendr Does this mean we are checking to see if our sample was too narrow? Meaning if our sample only included people who are 30, and not a true representation of the population?