How does np.mean() give us the probability?

Question

In the context of this exercise, how does np.mean() give us the probability?

Answer

The np.mean() function is generally used to get the average of all values in a dataset. We can apply this function with a logical statement to get the percent of values that satisfy the logical statement.

In the exercise example, we have np.mean(a==4). First, this will evaluate the conditional, a==4, which will return a list of True and False values. Then it will run np.mean() on that list of True and False values. When running np.mean() on a list of True and False values, True = 1, False = 0 during the calculation.

Because True values count as 1, this is like counting how many elements satisfy the logical statement, and the calculation is essentially:

(Number elements satisfying condition) / (Number total elements)

Example

# a = [4, 3, 1, ..., 4]
np.mean(a==4)
=
np.mean([True, False, False, ..., True])

# In Numpy functions, 
# True  counts as 1
# False counts as 0

# This is then equivalent to calculating
np.mean([1, 0, 0, ..., 1])

# Example:
# There are 10000 total elements
# 5000 elements equal to 4
np.mean(a==4)
= 
5000 / 10000 = 0.5 or 50% probability 
6 Likes

what is the exact mathematics behind np.random.binomial function?
i understood the part it has given some particular result. I am curious to know how exactly python works in background.

2 Likes

In this exercise - Why does ““print(no_emails)”” and ““print(b_test_emails)”” give 0.0 as results in the terminal?

3 Likes

My guess is that there’s simply no chance any of these combinations would be possible

Regarding no_emails, it is due to the very small probability that emails contains at least one 0. If the probability that a recipient of one email open it is 0.05, then the probability of sending 500 emails and being opened by nobody is (1 - 0.05) ** 500. Therefore, the probability that 0 will appear even once in 10000 trials is 1 - (1 - (1 - 0.05) ** 500) ** 10000. If you calculate this, it will be about 0.000000073.

zero_out_of_500 = (1 - 0.05) ** 500
p = 1 - (1 - zero_out_of_500) ** 10000
print(p)  # 7.2745140578e-08

This is the probability of 7.3 out of 100,000,000 clicks on ‘Run’.

Regarding b_test_emails, isn’t it around 0.002 to 0.003?

Hello there,

Could someone tell me how to find the probability of a point in an interval, like this?
prob_more68 = np.mean(Height_girls>=68)
prob_less70 = np.mean(Height_girls<=70)

Numpy does not allow me to include an & and write it in one line, like this:

prob = np.mean(Height_girls>=68 & Height_girls<=70)

Thanks,
K

prob = np.mean((Height_girls>=68) & (Height_girls<=70))
1 Like