I’m having trouble in understanding the following text available here: https://www.codecademy.com/paths/data-science/tracks/scipy/modules/dspath-hypothesis-testing/lessons/statistical-concepts/exercises/hypothesis-tests
Suppose we want to know if men are more likely to sign up for a given programming class than women. We invite 100 men and 100 women to this class. After one week, 34 women sign up, and 39 men sign up. More men than women signed up, but is this a “real” difference?
We have taken sample means from two different populations, men and women. We want to know if the difference that we observe in these sample means reflects a difference in the population means. To formally answer this question, we need to re-frame it in terms of probability:
“What is the probability that men and women have the same level of interest in this class and that the difference we observed is just chance?”
In other words, “If we gave the same invitation to every person in the world, would more men still sign up?”
A more formal version is: “What is the probability that the two population means are the same and that the difference we observed in the sample means is just chance?”
I have studied sample distributions, probability and hypothesis testing before, but where is the “sample mean” here? We have just two data sets for two different populations - one from each. Am I missing something? Or is some information actually missing Please point out.
Thanks in advance,