What does the np.random.binomial function return?


In Numpy, what does the np.random.binomial() function return?


The function returns a list of samples from a binomial distribution based on the inputted parameters when calling np.random.binomial(n, p, size).

For example,

# n = 500 (samples or trials)
# p = 0.5 (probability of success)
# size = 10000 (number of experiments)

tests = np.random.binomial(500, 0.5, size=10000)

# The result is 10000 randomly selected 
# experiments from the distribution
# [241, 262, ..., 248, 255, 235]

In the example above, the result of tests is consistent with our probability of 50% success. Each value can be calculated as "out of n".

241/500, 262/500, ..., 248/500, 255/500, 235/500
0.48, 0.52, ..., 0.496, 0.51, 0.47

which are close to 50%.


Can the difference between number of trials and number of experiments be further explained. Thank you.

1 Like

I have tried to input “mat.shape” as the size. where, mat is a numpy array of shape (10,10).
so size is 100. How many many other possible ways are there to give the parameters as input ?

Binomial Distribution was earlier explained as having two “peaks”. How does that apply to finding random data/probablity?

This was Bimodal Dataset and not Binomial.

What are the x and y axes representing in this problem? can someone please explain.
I thought y axis was the probability of x number of emails being sent.

The Y-axis denotes the number of times (frequency) of the result in X-axis. As we can see the spike in X-axis from 20-30 is pretty high, that means the frequency of those numbers in the dataset is high.

Here it is said that it returns a list but later in the chapter we are applying np.mean to this list to calculate the probablity, but these functions can be applied to the numpy arrays hence the fnp.random.arrays should return an array. Can someone please clarify this doubt.

Thank you for your answer. Could we please go deeper into it? Which of the following options should I interpret from the graph? Considering that out of 10.000 experiments, 5% of people have responded:

Should I conclude that, if I send Y emails (Y axis), I will have 5% of chance that X people (X axis) will respond?

Should I concluded that, if I repeat this same experiment of sending 10.000 emails Y times (Y axis), X people (X axis) will respond?

Any other correct interpretation? I would really appreciate if you could aply your own explanation to the exercise example.

And one last question for the instructors: in this same exercise, the graph example (the basketball case) plots frequency in terms of percentage. Why the graph that we generate for the email case plots absolute numbers?

Just think of this as if you performed a single experiment… probability says that from 500 sent emails, only 25 (or 5%) of them will be answered. That’s your average result, or the most common result if you will.

But maybe if you perform the experiment another time, the next result may be not exactly 25, but 30, which is more rare than 25, but not impossible.

If you keep going, you could stack the times you receive a certain number of replies within its corresponding bin (or range of results) in the X axis of your histogram.

To put it visually, it would look something like this:

      20 25 
   15 20 25 30
10 15 20 25 30 40

As you can see, each iteration of the same experiment produces a result wich is stacked in its corresponding bin (counting as 1 to sum to the height of its corresponding range of results or bin in the Y axis of your histogram). In this example, I performed the experiment 13 times.

Now imagine, in the context of this exercise, this same experiment is repeated 10,000 times (sending 500 emails with a probability to be answered of 5%, 10,000 times)

emails = np.random.binomial(500, 0.05, 10000)
  • N : 500
  • P : 0.05 (or 5%)
  • size : 10000

I hope this helps, cheers! :beer:


Thank you very much for your reply! It surely helped!

1 Like

Each experiment consists of N number of trials.