A visulalization of the 1 sample t-test

Question

Would you please provide a visual explanation of what the 1 sample t-test, ttest_1samp(exampleDistribution, expectedMean), is doing with its arguments?

Answer

Relating the behavior of our 1 sample t-test back to this applet, think of ttest_1samp as first taking the list of values, exampleDistribution, and turning that into a distribution. Once we have that distribution, we can compute a mean. You can think of this as one of the two distributions in the applet. Next, form a perfect normal distribution with mean expectedMean and a fixed standard deviation equal to that of the first distribution; this is your second distribution in the applet. The 1 sample t-test is then computing the exact p-value, or something similar, that we see at the bottom of the applet.

2 Likes

Unfortunately the applet is not working on my computer (changing values does nothing - not sure why. Running an up-to-date chrome browser on a 2018 macbook pro).

Anyway, I can’t seem to understand how it is possible to provide a p-value without having a parameter that specifies a population size?

As I had understood it, the confidence in a null-hypothesis for a given sample depends on the population size considered, right? So if my sample is n=5 but my population is n=5, the probability of a null-hypothesis happening should be 1, right? If the population goes up, the null-hypothesis goes down.

In the formula ttest_1samp, there is no parameter for population size. So I can’t seem to understand how it works?

Any help would be appreciated.

As far as I understand is that scipy handles this for you “in the background”. This means that it takes your list of values (observations) and derives from it a t-distibution (students distibution). For instance, the number of observations are necessary to determine the degrees of freedom (shape) of the t-distribution. Once you have the shape of the t-distribution you can derive its mean from it and compare it with your expected mean (null value). By knowing the “location” of both means, scipy derives the confidence interval from it. And when your expected mean (null value) falls within a 95%confidence interval (which results in a p-value grater than 5%) you can assume that your null hypothesis is “most probably” correct.

I know, probably not the best explanation (and potentialy flawed), but I am still trying to wrap my head around this as well.

1 Like

Still can’t understand this. A suggestion to Codecademy, please provide us with more practice and real life examples

8 Likes

I’ve learned for some extent about hypothesis testing in the last few days, so I would like to share what I understand about the t-test. If I make any mistakes, please let me know.

The 1-sample t-test ttest_1samp(exampleDistribution, expectedMean) is used to test whether the null hypothesis

The population mean equals to expectedMean.

is rejected or not. To that end, this function calculates the following statistic value called t-statistic:

T = (X - expectedMean) / (S / np.sqrt(n))

Here, X, S, and n represent sample mean, sample standard deviation, and sample size, respectively.

If the null hypothesis is true, the probability of the value of T will follow a probability distribution called the t-distribution:
t-dist

Now consider the actual sample exampleDistribution we observed. Calculate the sample mean x = np.mean(exampleDistribution), the sample standard deviation s = np.std(exampleDistribution, ddof=1), and the sample size n = len(exampleDistribution) of exampleDistribution. So we get the t-statistic t of this sample:

t = (x - expectedMean) / (s / np.sqrt(n))

The p-value is the conditional probability that T has a value of t or more extreme under the assumption that the null hypothesis is true. (Note that it is NOT the “probability that the null hypothesis is true.”) For a two-sided test, the p-value is the area of colored parts in the following image:
p-value1

If the p-value is lower than the significance level, which is 0.05 here, it means that “observing such a result would only occur with a probability of less than 5% if the null hypothesis were true.” This is an enough reason to doubt the null hypothesis. So we will reject the null hypothesis. However, the p-value is not zero, so it is not zero probability that such a rare observation occurred by chance this time. There is always a risk of Type I errors.
p-value2

If the p-value is greater than the significance level, we cannot reject the null hypothesis. It should be noted here that this does not necessarily mean that the null hypothesis is true. This means, “no evidence has found to reject the null hypothesis, and it may be true or false, we don’t know”. There is always a risk of Type II errors.
p-value3

16 Likes

I encourage you to read @object2161442840 answer because it’s well fleshed out, but here’s a TL;DR
the test assumes that the total population is distributed around the expected mean in the same way that the observed sample is.

Once it has assumed the shape and standard deviation of the total population, it can plot the probability of the observed sample mean falling at least as far away from the expected mean.

If you’d like a little more color around it, I recommend the book Naked Statistics, it’s a surprisingly pleasant read!

1 Like