Hypothesis testing - using different datasets

hey there everyone!

While reading the Introduction to Hypothesis Testing I got a bit confused about the null distribution. The dataset on " The question" section shows a skewed right dataset.

Why are we able to use the null distribution instead of the real dataset? Does it matter? I don’t think I’m getting the null distribution usage.

Are you talking about determining the Null distribution (on the link/sheet) in Step 3?

It might be useful to read up on CLT too.
https://towardsdatascience.com/central-limit-theorem-95f355934d98

and here:
https://towardsdatascience.com/understanding-the-central-limit-theorem-aeb89dd0ccad

2 Likes

@lucasvinzon,

The reason we compare the average of the Statistics Academy students to the null distribution rather than the population’s actual distribution is because we are trying to find out how likely it is that the 100 student Statistics Academy sample has a higher average score by chance.

Or, in other words, how likely is it that another random sample from the population would have an average score of 31.16 (the average for Statistics Academy students).

So, we compare the Statistics Academy average to the averages of many other 100-person samples (the null distribution) to see where it would land. This gives you the information you need to reject the null hypothesis (that the Statistics Academy average was higher merely by chance) in favor of your alternative hypothesis, whatever that may be.

You will definitely want to check out the articles on the Central Limit Theorem that @lisalisaj linked to and maybe look up some articles on null and alternative hypotheses to gain a better understanding.

Hope this helps!

3 Likes

OH GOD FINALLY!

Just read the article and I got to say that I’m relieved.

2 Likes