What conclusions can i take from this 1 Sample T Test results?

Hello, everyone.

On this exercise what can i take as a conclusion with the results?

Code

from scipy.stats import ttest_1samp
import numpy as np

correct_results = 0 # Start the counter at 0

daily_visitors = np.genfromtxt("daily_visitors.csv", delimiter=",")

for i in range(1000): # 1000 experiments
   #your ttest here:
   tstat, pval = ttest_1samp(daily_visitors[i], 30)
   if pval < 0.05:
     correct_results += 1
   #print the pvalue here:
   print(pval)

print "We correctly recognized that the distribution was different
     in " + str(correct_results) + " out of 1000 experiments."

Output

We correctly recognized that the distribution was different in 499 out of 1000 experiments.

What is the conclusion i can take from this number? I cant see why knowing that 499 out of 1000 fit pvalue < 0.05 are relevant. Is it just to show that about half of our samples will fit our pvalue desired range?

Thank you for your time!

Hi @vicaugusto33,

As per this cheat sheet,

image
So, with a 1 Sample T-Test you take your sample and are determining how confident you are that your sample came from a total population with your desired mean.

In the exercise you are referring to, I believe (from reading the description) that the point is to show that just because you don’t get p-value of less than 0.05, it doesn’t mean that there isn’t a difference between the mean of the population your sample comes from and a population with your desired mean — it just means that it cannot be confidently concluded from the sample that you used in the T-test.

To demonstrate this, the exercise had you take 1000 samples from a population and compare how many of those samples would have shown a significant difference from the desired mean of 30 (pval < 0.05) and how many would not.

1 Like

So the point was to show that taking one sample t-test for just one small sample set is not representative for the population?

But then, how can i be sure that the sample i took is actually representative?
Should i always doubt whatever answer i get and make this for loop for every 1 Sample T Test i make?
Seeing that out of 1000 experiments, 499 fit our desired p-value makes me think that, in this case, there’s no correlation between the desired mean and the actual mean and that the difference between our desired mean and actual mean is not a natural fluctuation.

@vicaugusto33, sorry in advance for the lengthy discussion below!

I believe the point was to demonstrate two things:

  • That p-values are just measurements of confidence that the null hypothesis is/is not true and are not definitive; and
  • The result of a 1 Sample T-test is highly dependent on the sample you provide and whether it is representative of your likely population.

Because these hypothesis testing lessons are pretty confusing, I’m going to break this down to the nitty gritty so we can see what’s going on here.

Background

So when running this test, your null hypothesis is that the sample belongs to a population with a mean age of 30.

In the previous exercise, using the first sample, the p-value was something like 0.56. If that is our only sample, then we conclude that we cannot rule out the null hypothesis because there is a 56% likelihood that — based on the sample you provided — the null hypothesis is true. However, that sample size was only 14 (or supposed to represent 100 according to the exercise), and is probably not a good sample size for the likely population of BuyPie customers.

For this exercise, you are told to loop through 1000 days of customer info, with each day being its own sample. In your loop you perform a 1 Sample T-test on that day’s sample and you print out the p-value as well as total number of tests where we could reject the null hypothesis: 499 or 1000.

Immediate Takeaways

Okay, cool, so we see 1000 p-values, some well below 0.05 and some well above. What does this tell us? It tells us that if we take 1000 different 100-person samples of the same population and run T-tests, the T-tests will vary widely in their confidence of accepting/rejecting the null hypothesis, depending on which values are included in the sample.

But what does this really tell us? It tells us that 100 is too small of a sample size to be representative of our likely population. We shouldn’t have such wildly different results if each of our samples was the correct size. How do we know that our current sample size is 100? Try running this code:

count = 0
for i in range(len(daily_visitors[0])):  #daily_visitors[0] is day 1 of our 1000 days
  count += 1
print(count) #prints 100

Great question. The best way is to use a sample size calculator to find out what your sample size should be, to be statistically confident that it represents your likely population. Codecademy actually goes over this in the Sample Size Determination course, but for some reason they decided to put that course after the Hypothesis Testing course.

No. This is not a statistically sound way to evaluate your data, and Codecademy just used it for example purposes.

A Better Approach

You might be wondering how to properly find whether the average age of BuyPie’s customers is 30, using a sample. Below, I’ll give you an example of how to do just that using the data Codecademy provided.

First, add import random to the top of your code and comment out everything from your for loop down.

Now, let’s take all of the data from daily_visitors and put it into one list:

all_customers = []
for i in range(len(daily_visitors)):
  for j in daily_visitors[i]:
    all_customers.append(j)

Now, let’s figure out the size of the sample we need. To do this, we’ll use the sample size calculator from Codecademy’s Sample Size Determination course:
image
For more info on how to choose the numbers, check out the course, which is included in the Data Science path.

Now that we know that if we want a confidence level of 99%, we should use a sample of 659 people, we need to choose 659 randomly sampled people from all_customers. We can do this with random.sample():

sample = random.sample(all_customers, k=659)
tstat, pval = ttest_1samp(sample, 30)
print(pval)

As you will see, pval will be less than 0.05, meaning that we can reject the null hypothesis and confidently say that the mean age of our BuyPie customer population is not 30.

Now, assuming that the 100,000 data points we have represent the entire population, let’s check what our actual mean is:

print(np.mean(all_customers))   # prints 31.00082

Our 1 Sample T-test was accurate! Of course, in real life we will probably never know our exact population size or be able to verify our T-test against the actual mean (especially for an online store). But, here it is cool to be able to check whether the proper sample size helped any.

If you are interested to see the difference between the exercise (1000 tests with an improper sample size) and running 1000 tests with the correct sample size, you can put this code in a loop, just like the original exercise:

null_true = 0

for i in range(1000):
  sample = random.sample(all_customers, k=659)
  tstat, pval = ttest_1samp(sample, 30)
  print(pval)
  if pval > 0.05:
    null_true += 1

print("We accepted our null hypothesis that the population's average age was 30...{} times!".format(null_true))

Anyway, hopefully this helped you out. Happy coding!

1 Like

I can’t thank you enough for such a complete answer, thank you so much!

This lesson about hypothesis testing hit me really hard, but things are starting to clear now.

So the exercise wasn’t build to give me the right answer, but to show me that not having enough data to detect a difference doesn’t mean that there isn’t one, because even when you have a lot of data you can have a misleading answer if don’t choose your sample group correctly.

I had a lot of trouble in the next exercises too, i didn’t understand why the results are what they are, in my head they shoud all be the oposite.
I wrote my questions and traced my problems towards the null hypothesis, i didn’t build it correctly and it was taking me to the opposite place where i should be.
I started making them always starting with “There is no difference between”, building like this and understanding the points you made about sample size made the results start making sense.
Thank you a LOT!