Hi Codecademy team,
I just finished going through the narrative on Hypothesis Test Page 7. I have some doubts that I need data science experts to help me clear them out, please
In the exercise, we are trying to figure out which distributions are not normal, and which ones are not suitable for ANOVA test. Lastly, we also want to check if these 2 dataset distributions (distribution 2 and distribution 3) can be used to perform the numerical hypothesis test.
Below are the histograms plot for distribution 1, 2, 3 and 4:
By looking at these histograms, I interpreted that:
-
Distribution 1, 3 and 4 are not normal. Therefore, only distribution 2 has a normal distribution. Is this true?
-
The question also asked " Which of these distributions would probably not be a good choice to use in an ANOVA comparison? Create a variable called
not_normal
and set it equal to the distribution number that would be least suited to be used in an ANOVA test."
My answer to this question was distribution 1, 3 and 4 are not suited for ANOVA test due to the fact that they donât have normal distributions.
But, the narratorâs answer is distribution 4 only. Why is that?
- The last question is " Calculate the ratio of standard deviations between
dist_2
anddist_3
and store it in a variable calledratio
. Print it to the console. Is this âclose enoughâ to perform a numerical hypothesis test between the two datasets?"
Below is the code to calculate dist_2 and dist_3 std dev and their ratio:
dist_2_std = np.std(dist_2)
dist_3_std = np.std(dist_3)
ratio = dist_2_std / dist_3_std
print(dist_2_std)
print(dist_3_std)
print(ratio)
The result of the code is
2.93237588202
5.0434543879
0.58142210804
As you can see, the ratio is 0.58, which in my opinion is not close enough to 1 and therefore, these two datasets will not be suitable to perform numerical hypothesis tests. Is this conclusion correct or wrong ? if wrong, why is it wrong?
Please help me demystify this matter. Also, please donât post any unnecessary comments to make my life and other learnersâ life easier to re-visit this topic if needed.
Thank you very much,
Jimmy