FAQ: Hypothesis Testing - One Sample T-Test II

This community-built FAQ covers the “One Sample T-Test II” exercise from the lesson “Hypothesis Testing”.

Paths and Courses
This exercise can be found in the following Codecademy content:

Data Science

FAQs on the exercise One Sample T-Test II

Join the Discussion. Help a fellow learner on their journey.

Ask or answer a question about this exercise by clicking reply (reply) below!

Agree with a comment or answer? Like (like) to up-vote the contribution!

Need broader help or resources? Head here.

Looking for motivation to keep learning? Join our wider discussions.

Learn more about how to use this guide.

Found a bug? Report it!

Have a question about your account or billing? Reach out to our customer support team!

None of the above? Find out where to ask other questions here!

In this exercise, we expect the population mean to be 30 but the mean of our sample is 31. So wouldn’t our null hypothesis be “the sample represents a population of mean 31” Am I right? Then we go on and use test_1samp(distribution, 30) to find p value. I am confused. What are we checking for in this hypothesis test? Are we checking If the sample represents a population of mean 30? If so, if it is less than 0.05 then does it mean it represents a population of mean 30?

Although delayed, I have posted a response to your question. Please check here.

I am a little confused on what this exercise is trying to show us. I understand the meaning of p-value and null hypothesis. The null hypothesis is that the 31 mean is derived from the larger subset with mean of 30 and the difference observed is due to random chance. A p-value of less that 0.05 would allow us to reject this null hypothesis and then you could say, 'I am 95% sure that this sample is derived from an entirely different population than the one that had a mean of 30.

Where I get a little lost is in the exercise we divide up the sample into 1000 smaller samples and take the p-vale of each. This results in roughly 50% of the time getting a p-value < 0.05, hence the printed statement, “We correctly recognized that the distribution was different in " + str(correct_results) + " out of 1000 experiments.” How is it that only half show a significant p-value? Aren’t they all from the same subset just broken up into 1000 smaller pieces and thus the p-values shouldn’t vary as much? Is this due to each of the 1000 subsets being too small of a sample to give a reliable p-value? Is this a lesson to not always trust p-values? Because how I read the above quotations is “50% of the time you can be 95% confident”. My brain starts to implode at that point as I wonder if p-values can be trusted…

1 Like

The R equivalent for this lesson was so much easier to understand. This lesson is so confusing I had to go back and make sure I understood everything.