The Binomial Test

Hey all,

I’m doing a Data Science project (in Python 3), and in it I am using the Binomial Test. I just want to confirm that it’s the appropriate test, and that I’m using it correctly.

The project looks at death rates in the population (of the US) over the last twenty years. I’m taking each year’s population to be a sample, the number of deaths as ‘successes’ (grim, I know, but I’m using the terminology! Heh), and the average death rate (calculated by dividing the number of deaths by the population size and averaging them) to be the probability of ‘success,’ i.e. the expected probability of success for the population. Since a ‘trial’ in the sample (a person in the population) either dies or not, I’ve considered ‘death’ to be a binary categorical variable, and so I’ve chosen the Binomial Test to see if the number of deaths of another year is significant.

  • Null Hypothesis: The number of deaths in the new year is a sample from a population with mean M.
  • Alternative Hypothesis: The number of deaths in the new year is not a sample from a population with a mean M,

where M is the average death rate for the past twenty years.

So, using scipy.stats.binom_test, this is (basically) how I’ve filled out the function:
binom_test(deaths_in_the_new_year, n=population_in_the_new_year, p=mean_death_rate_of_previous_years, alternative='greater')

Is this right? Have I chosen this test and its parameters appropriately? Am I making any mistakes in my method?

You definitely have the right idea! If you’re going to use a hypothesis test for this project, then the binomial test is the right one. A few small thoughts/technicalities about how to frame your research question (and appropriateness of the test):

  • For the test you’re describing, the null hypothesis is that the probability of a person dying in a given year is x% (whatever you calculated based on the historical data); the alternative hypothesis is that the probability of a person dying in that year is >x%.
  • If you truly have data for the entire population of people who were alive at the start of the year — and know whether or not they died — then you don’t actually need a hypothesis test because you already know the exact death rate for the year you care about (and can see whether it is equal to or greater than x%). The purpose of a hypothesis test is to draw some inference about a population statistic, having only observed a sample from that population (for example, if you want to know whether the death rate among people with diabetes is significantly higher than some rate, but you cannot know whether or not every single person in the world has diabetes— you could instead choose a sample of people for whom you can collect that info). As you’re forming your question and thinking about a hypothesis test, it’s important to define your “population” of interest and think about what “sample” you’ve collected from that population.
  • A few of the assumptions of a binomial test may be violated here, although researchers often overlook these things because real data is messy (and almost never meets all necessary assumptions). But technically speaking, a binomial test assumes that the probability of success is the same for all sampled members of the population; however, in this case, each member of the population would theoretically have a different risk of dying in a given year (based on age, health, etc). Another assumption is that the sample is representative of the population, but I’m a little unclear on what your sample and population are (because of the above bullet points), so I’m not sure about this assumption.

Long story short— you’re on the right track, but it’s worth thinking about the exact question you’re trying to answer and consider whether a hypothesis test is the right method to answer that question. Good luck with your project!

3 Likes