This community-built FAQ covers the “ANOVA” exercise from the lesson “Hypothesis Testing”.
Paths and Courses
This exercise can be found in the following Codecademy content:
Data Science
FAQs on the exercise ANOVA
There are currently no frequently asked questions associated with this exercise – that’s where you come in! You can contribute to this section by offering your own questions, answers, or clarifications on this exercise. Ask or answer a question by clicking reply (
) below.
If you’ve had an “aha” moment about the concepts, formatting, syntax, or anything else with this exercise, consider sharing those insights! Teaching others and answering their questions is one of the best ways to learn and stay sharp.
Join the Discussion. Help a fellow learner on their journey.
Ask or answer a question about this exercise by clicking reply (
) below!
Agree with a comment or answer? Like (
) to up-vote the contribution!
Need broader help or resources? Head here.
Looking for motivation to keep learning? Join our wider discussions.
Learn more about how to use this guide.
Found a bug? Report it!
Have a question about your account or billing? Reach out to our customer support team!
None of the above? Find out where to ask other questions here!
I don’t get it:
In the explanation: “The null hypothesis, in this case, is that all three populations have the same mean … If we reject this null hypothesis (if we get a p-value less than 0.05), we can say that we are reasonably confident that a pair of datasets is significantly different.”
But in the exercise:
With store_b the means are : 58.349636084 65.6262871356 62.3611731859
and p-value is 0.000153411660078 ie we can reject the null hypothesis (see above) and the samples are different.
With store_b_new the means are: 58.349636084 148.354940186 62.3611731859
and p-value is 8.49989098083e-215 ie we cannot reject the null hypothesis (see above) and the samples are basically the same.
Surely that is the wrong way round?
1 Like
No, it’s correct.
The null hypothesis in this case is “There is no significant difference in sales between the stores.”
Rejecting the null hypothesis (p-value < 0.05) would mean there IS a significant difference between the at least one store.
The new sales numbers for Store B easily pass the eye test and you’d expect to reject the null hypothesis. And that’s exactly what happened in the ANOVA test (p-value = 8.49989098083e-215). You would say that there is a 99.999999…% chance that a store is significant.
3 Likes
I found myself still confused as to how the p-value was less than 0.05 until I learned what the “e-” means. This wasn’t taught anywhere in the Data Science path prior to this exercise so I am posting it here in case it is new to anyone else.
Basically, the “e-” format in this case tells you that the p-value is 8.49989098083 times 10^-215, so it is 0 point followed by 214 zeroes and then 849…
In the project that comes after the end of this module, there is a p-value of 2.74631179866e-10. So, this should be read as the p-value equaling 0.000000000274631178966.
8 Likes
Why ANOVA’s prediction is much precise than Two Sample T-Tests?
My first suggestion:
With both methods we check: if at least one pair of samples in, let us say in 3 samples set, has significant differencies.
-Two Sample T-Test checks it via comparisson pairs like this: a_b & a_c & b_c (all the experiments depend on each other)
-ANOVA checks it via comparisson pairs like this: a_b | a_c | b_c (all the experiments are isolated)
My second suggestion:
Both methods check different things:
-Two Sample T-Test checks which of 3 pairs has significant differences (it is more specific conclusion, but much demanding to accuracy)
-ANOVA checks if there are significant differencies between at least one pair (it is more general conclusion, since we don’t know the specific pair, but it is less demanding to accuracy)
I found this very helpful.
I do not remember this e-format as well. I was getting curious when it asked if it is less than 0.05
Can you please tell how the tests are dependent (2 sample) and independent( anova).
It seems to me this is a poor example. The first iteration of the p-value is to say that there is significant difference between the sets. In the second iteration, the p-value is even less so we say there is really significant difference between the sets. The lesson never takes on a set of values that demonstrate that there is not a significant difference between sets. The student is then left to wonder what the point was. Also, why are we ignoring the introduction of the “fstat” without a passing mention of its importance?