FAQ: Hypothesis Testing - 2 Sample T-Test

This community-built FAQ covers the “2 Sample T-Test” exercise from the lesson “Hypothesis Testing”.

Paths and Courses
This exercise can be found in the following Codecademy content:

Data Science

FAQs on the exercise 2 Sample T-Test

There are currently no frequently asked questions associated with this exercise – that’s where you come in! You can contribute to this section by offering your own questions, answers, or clarifications on this exercise. Ask or answer a question by clicking reply (reply) below.

If you’ve had an “aha” moment about the concepts, formatting, syntax, or anything else with this exercise, consider sharing those insights! Teaching others and answering their questions is one of the best ways to learn and stay sharp.

Join the Discussion. Help a fellow learner on their journey.

Ask or answer a question about this exercise by clicking reply (reply) below!

Agree with a comment or answer? Like (like) to up-vote the contribution!

Need broader help or resources? Head here.

Looking for motivation to keep learning? Join our wider discussions.

Learn more about how to use this guide.

Found a bug? Report it!

Have a question about your account or billing? Reach out to our customer support team!

None of the above? Find out where to ask other questions here!

Hi. Have a probem with the exercise. If week 1 mean = 25, and std = 4, then 68% of velues should be between 21 and 29. Also, If week 2 mean = 29, and std = 5, then 68% of velues should be between 24 and 34. Looks like the region 24 - 29 may content elements from both arrays. Why then p-value is so small (0.00067)? In this case Null hipothesis should be true. But result is opposite. Or I didnt get something :frowning:


I think nobody is participating or helping in the forums in the pro data science section. They probably want you to pay extra money for an advisor.
That said, statistics pedagogy is at Codeacademy is very weak

I think I have the same confusion, I would understand with such small p-value indicates we reject the null hypothesis, I will go further to see if I catch something more

The P value tells us if the means are the same. So while there is overlap in the 68% range of both data sets, the means themselves are different.

Essentially, the P value is the probability that the two data sets came from the same overarching population. So, this small P value tells us that these data sets do not come from the same population and, for this example, that something different or weird happened one week to skew the data.


You are correct that there is some overlap of the values at the 68% confidence interval, but remember, our definition for rejecting the null hypothesis is more stringent. We reject the null hypothesis at the 95% confidence level, which is why the p-value must be less than 0.05 or less than 5%. This roughly means that if we were to repeat the measurement of a small sample size to find a mean, less than 5% of the time, we would observe a mean consistent with our control population, meaning we could interpret them as having the same mean. At the 68% confidence level, we would see the same mean about 32% of the time, because there is more significant overlap of values at this lower confidence.

1 Like

This thread is old, but since it’s the first and only interaction I see for this exercise I’m adding my thoughts in case anyone else needs it (and also hoping that I’m correct since I couldn’t find the validation I was looking for =)

Based on my understanding the simplest way to conceptualize this problem is as follows:

  • in hypothesis testing the p-value is basically a way to determine if you should reject the null hypothesis
  • anything < 0.05 indicates strong evidence against the null hypothesis (low confidence), therefore we will reject it

In this case the null hypothesis is “these two distributions have the same mean” and the p-value returned is 0.00067.

0.00067 is less than 0.05 therefore we would reject the null hypothesis.
Result: these distributions do not have the same mean.

This makes sense and is expected because our data sets are static (unchanging) and when you check the mean for each individually:
week1_mean = 24
week2_mean = 29

So while there would be overlap within the distribution sets, the only factor we’re trying to test is if the difference between the distributions is statistically significant (if the means are the different).

If I’m misunderstanding someone enlighten me please :pensive:


You have it correct. Some interpretations state that the test gives the probability that both distributions are drawn from the same population, but I suspect that with certain assumptions about the variance or standard deviations, the two interpretations are equivalent.

1 Like

I also have the same thought. According to the null hypothesis of this exercise, the p-value yielded seems to support the claim.
I do have a second thought come to mind. If we change the null hypothesis to " these two distribution doesn’t have significant difference’, given the p-value, we will need to reject the null hypothesis, which will result in type 1 error. Is that correct? if so, knowing it is type 1 error, how do we fix it? By getting more samples point or more trails or both?

I will use an example that is a lot more intuitive to most people but with completely made up numbers

Let’s say we have a population of both men and women. We find the average height of the men is 170cm with a std deviation of 15 cm. Then we find the average height of the women is 155 cm with the same standard deviation.

There’s quite a bit of overlap of the 1σ range (1 std). Meaning it wouldn’t be that rare for a woman to be taller than a man. But the null hypothesis in this case is “Women have the same average height as men” In this case we should reject the null hypothesis (of course there is a chance of sample variation that would mean we choose unrepresentative people) because we know that men and women have different average heights.

The important message here is that the p-value has no meaning on the size of the effect, just that there is an effect. The size would come from the t value that Codecademy is discarding.

Now, speaking much more generally, this is a very important thing to understand about science in general. Finding a significant effect in science isn’t the same meaning as most people use the term colloquially. Significant has no relation with size, it just means that it was able to be measured.

A real world example of where this is a problem is in the US medical system. FDA approval requires that a drug provide a significant benefit over existing treatments and Medicare must pay for all FDA approved drugs. Well, FDA approval doesn’t take the sales cost into account and Medicare doesn’t influence the FDA. This means if a drug can take a cancer case from 6 months of survival to 7 months of survival compared to existing treatments, it is a significant improvement (see how the word “significant is very different” and companies then have a monopoly to sell to the government at whatever price they want for a relatively small effect.


Thanx for that detailed explanation @netninja25781

I’ve found Khan Academy very helpful to understand these and other statistical concepts.

You can check it out here:

Hope it helps, cheers! :beer:

1 Like

It is very useful. Thank you!