FAQ: Hypothesis Testing - 1 Sample T-Testing

Community%20FAQs%20on%20Codecademy%20Exercises

This community-built FAQ covers the “1 Sample T-Testing” exercise from the lesson “Hypothesis Testing”.

Paths and Courses
This exercise can be found in the following Codecademy content:

Data Science

FAQs on the exercise 1 Sample T-Testing

Join the Discussion. Help a fellow learner on their journey.

Ask or answer a question about this exercise by clicking reply (reply) below!

Agree with a comment or answer? Like (like) to up-vote the contribution!

Need broader help or resources? Head here.

Looking for motivation to keep learning? Join our wider discussions.

Learn more about how to use this guide.

Found a bug? Report it!

Have a question about your account or billing? Reach out to our customer support team!

None of the above? Find out where to ask other questions here!

I’m gettting confused by this paragraph
" When we conduct a hypothesis test, we want to first create a null hypothesis , which is a prediction that there is no significant difference. The null hypothesis that this test examines can be phrased as such: “The set of samples belongs to a population with the target mean”.

The result of the 1 Sample T Test is a p-value , which will tell us whether or not we can reject this null hypothesis. Generally, if we receive a p-value of less than 0.05, we can reject the null hypothesis and state that there is a significant difference."

I thought a p-value less than 0.05 meant the data was very good, and the sample should be used. Or is the lower p-value suggesting it is more by chance that the result came about? I think I’m gettting tripped up when it says null-hypothesis, but having a low p-value would reject the null-hypothesis which is a good thing if we want prove the sample results weren’t based on chance.

Does this first paragraph mean a sample that belongs to a population target mean is only by chance? " When we conduct a hypothesis test, we want to first create a null hypothesis , which is a prediction that there is no significant difference. The null hypothesis that this test examines can be phrased as such: “The set of samples belongs to a population with the target mean”.

9-23-19 Update I’ve been doing some more research and i think I may have figured this out, but would like someone to clarify if I am correct. The p-value is telling me if the sample is a good representation of the population. For example if I was trying to get the mean age of my city, and my sample showed a mean of 30. The p-value would let me know if I took samples that do not represent the population of the city. Meaning my sample may have been way to narrow and I only sampled people who are 30, instead of getting a true representation of all the ages within the population of the city. If the p-value is greater than 0.05 its telling me my sample is most likely not representing the population as a whole. If the p-value is smaller than 0.05 the sample was diverse enough to represent the population as a whole.

1 Like

The first part is correct. The second is not: the p-value gives the probability that your sample perfectly represents the larger population, assuming that the larger population is normally distributed. That is the null hypoyhesis: that your selected sample differs not at all from a sample selected randomly from the population. A p-value of 1, then, would indicate that is precisely the case.

But, you were interested in height. Did you choose a truly random sample? Maybe you chose from a certain zip code, or only people who had played basketball in their youth, or residents of nursing homes.

A low p-value, then, suggests that there is some way in which your sample differs from the population, more specifically, that the mean height of your sample differs “significantly” from the mean height of the population, assuming of course, that we know the mean height of the population, that its heights are normally distributed, and that your sample is “more or less” normally distributed.

@patrickd314 When using your example of height. I’m thinking I would want samples with a high p-value? That would make sure the sample is more closely matched with the population I’m analyzing?

Yes, if you needed for some reason a sample closely resembling the general population, then you would hope that the p-value of your single-value t-test would be high.

(This raises an interesting question: whether it would be acceptable to the experimental design people to “cherry pick” your sample to get a high p-value, or whether you would be required simply to specify a - hopefully random - selection technique and stick with it. Since post hoc data selection is frowned upon, I don’t know the answer. )

@patrickd314 I like your question. Does that mean the p-value doesn’t necessarily show a sample is “good” or “bad” to use? Rather the p-value is showing if the sample may be an outlier? Meaning if the p-value is low, I would still included it with all my sample results, but will want to sample again, to see if the results start to get closer to the population mean?

To go back to your height example. If I was trying to find the average height for my school, and took one sample from one class, the p-value will most likely be high since it would have a wide range of students. But if my second sample was of the basketball team only, the p-value would be low, and telling me this is more of an outlier. I would then want to compare my samples to the population mean, to see why one was so high, and the other so low? So if I wanted to continue my samples in the future, I would know that sampling just the basketball team is not a good idea, and I will want to widen my sample group?
To me, it seems like if my samples are constantly showing a low p-value, it would tell me I have an error in my sampling approach. But maybe that isn’t the purpose of the p-value, to show if the samples are good to use or not? Maybe the p-value is more about understanding if there is enough data to analyze from the samples?

The examples I’ve always seen of the use of a single-value t-test are to ask if some characteristic of your sample is different from the population.

For instance, do basketball players have a greater mean height than the general population (about which I presumably have good, independent data)? That the players have, on average, the same height as the general population would be the null hypothesis.

To find out, I measure or otherwise obtain basketball players heights, plug that dataset into the program for t-test, along with general population data, and presumably will get a very low p-value, telling me that my sample population has a “significantly different” mean height than the general population.

The question we got into above, however, is, different. Say we for some reason needed a sample that has the same height as the general population. Maybe we wanted to do a study to find out if hanging from a chin-up bar ten minutes a day increases ones height. Would a t-test be appropriate to test our initial selection of subjects? I’m not familiar with it being used in that manner, though I can’t think why not. It has to do with experimental design, the details of which I don’t know much about.

1 Like

@patrickd314 Your last example made it very clear for me now. The chin-up example is great! It explains the exact kind of scenario I was thinking of. Thank you very much for your time, and contribution!

1 Like

Small remark:
It is kind of annoying that the console doesn’t recognise my code as correct when I enter an additional space in between “print” and “()”. If possible, this should be corrected for.

Thanks :slight_smile:

Are you carrying this habit over from another programming language?

Actually not. I just find it clearer with an additional space in between the print statement and what I am printing.

It is just annoying that I get an error message even though the code is clearly correct.

is there a hypothesis test taking place here and if so would the null hypothesis be that the same mean height is EQUAL TO 30?