FAQ: Hypothesis Testing - Binomial Test

This community-built FAQ covers the “Binomial Test” exercise from the lesson “Hypothesis Testing”.

Paths and Courses
This exercise can be found in the following Codecademy content:

Data Science

FAQs on the exercise Binomial Test

In a binomial test why do these queries produce different results?

pval = binom_test(500, n=10000, p=0.06)
##Returns 1.72803920958e-05

pval2 = binom_test(700, n=10000, p=0.06)
##Returns 3.99398401091e-05

They are both 100 points away from the expected mean of 600, yet they yield different results.

1 Like

Can we also use one-sample t-test instead of Binomial Test in these examples?


1 Like

I’m not sure, but I think it’s probably related to the fact that the binomial distribution isn’t symmetric. For n=10000, p=0.06, the binomial distribution is nearly symmetric, but slightly skewed to the right and has a long tail on the right (see the following article). I don’t know which calculation method binom_test() uses in the two-sided test, so I can’t explain exactly.

With n=10000 and p=0.06, it will look like the following. The right side is cut at 1500 to make it easier to see, but actually the right tail goes all the way to 10000.

The t-test applies to a dataset expected to be randomly sampled from a normally distributed population. But the dataset in this time is a categorical dataset with two categories, so I thought that it would not fit the t-test - until I know that a binomial distribution is approximated by a normal distribution.

The binomial distribution is approximated by a normal distribution with mean n*p and variance n*p*(1-p) when n is large. Considering this fact, perhaps it is valid to use the following t-test instead of the binomial test. To replace the data that 510 out of 10000 customers clicked, into a dataset that can be used for the t-test, assign the value 1 if clicked, 0 if not clicked.

from scipy.stats import ttest_1samp

data = [1 for i in range(510)] + [0 for i in range(9490)]
tstat, pval = ttest_1samp(data, 0.06)
# 4.33359052847e-05

data2 = [1 for i in range(590)] + [0 for i in range(9410)]
tstat2, pval2 = ttest_1samp(data2, 0.06)
# 0.671296011699

However, I think this is less accurate than the binomial test.