FAQ: Hypothesis Testing - Binomial Test

This community-built FAQ covers the “Binomial Test” exercise from the lesson “Hypothesis Testing”.

Paths and Courses
This exercise can be found in the following Codecademy content:

Data Science

FAQs on the exercise Binomial Test

There are currently no frequently asked questions associated with this exercise – that’s where you come in! You can contribute to this section by offering your own questions, answers, or clarifications on this exercise. Ask or answer a question by clicking reply (reply) below.

If you’ve had an “aha” moment about the concepts, formatting, syntax, or anything else with this exercise, consider sharing those insights! Teaching others and answering their questions is one of the best ways to learn and stay sharp.

Join the Discussion. Help a fellow learner on their journey.

Ask or answer a question about this exercise by clicking reply (reply) below!

Agree with a comment or answer? Like (like) to up-vote the contribution!

Need broader help or resources? Head here.

Looking for motivation to keep learning? Join our wider discussions.

Learn more about how to use this guide.

Found a bug? Report it!

Have a question about your account or billing? Reach out to our customer support team!

None of the above? Find out where to ask other questions here!

In a binomial test why do these queries produce different results?

pval = binom_test(500, n=10000, p=0.06)
print(pval)
##Returns 1.72803920958e-05

pval2 = binom_test(700, n=10000, p=0.06)
print(pval2)
##Returns 3.99398401091e-05

They are both 100 points away from the expected mean of 600, yet they yield different results.

1 Like

Can we also use one-sample t-test instead of Binomial Test in these examples?

https://www.codecademy.com/paths/data-science/tracks/scipy/modules/dspath-hypothesis-testing/lessons/hypothesis-testing/exercises/binomial-test

1 Like

I’m not sure, but I think it’s probably related to the fact that the binomial distribution isn’t symmetric. For n=10000, p=0.06, the binomial distribution is nearly symmetric, but slightly skewed to the right and has a long tail on the right (see the following article). I don’t know which calculation method binom_test() uses in the two-sided test, so I can’t explain exactly.

With n=10000 and p=0.06, it will look like the following. The right side is cut at 1500 to make it easier to see, but actually the right tail goes all the way to 10000.
binomial-dist

The t-test applies to a dataset expected to be randomly sampled from a normally distributed population. But the dataset in this time is a categorical dataset with two categories, so I thought that it would not fit the t-test - until I know that a binomial distribution is approximated by a normal distribution.

The binomial distribution is approximated by a normal distribution with mean n*p and variance n*p*(1-p) when n is large. Considering this fact, perhaps it is valid to use the following t-test instead of the binomial test. To replace the data that 510 out of 10000 customers clicked, into a dataset that can be used for the t-test, assign the value 1 if clicked, 0 if not clicked.

from scipy.stats import ttest_1samp

data = [1 for i in range(510)] + [0 for i in range(9490)]
tstat, pval = ttest_1samp(data, 0.06)
print(pval)
# 4.33359052847e-05

data2 = [1 for i in range(590)] + [0 for i in range(9410)]
tstat2, pval2 = ttest_1samp(data2, 0.06)
print(pval2)
# 0.671296011699

However, I think this is less accurate than the binomial test.