# How does np.mean() give us the probability?

### Question

In the context of this exercise, how does `np.mean()` give us the probability?

The `np.mean()` function is generally used to get the average of all values in a dataset. We can apply this function with a logical statement to get the percent of values that satisfy the logical statement.

In the exercise example, we have `np.mean(a==4)`. First, this will evaluate the conditional, `a==4`, which will return a list of `True` and `False` values. Then it will run `np.mean()` on that list of `True` and `False` values. When running `np.mean()` on a list of `True` and `False` values, `True = 1, False = 0` during the calculation.

Because `True` values count as `1`, this is like counting how many elements satisfy the logical statement, and the calculation is essentially:

`(Number elements satisfying condition) / (Number total elements)`

#### Example

``````# a = [4, 3, 1, ..., 4]
np.mean(a==4)
=
np.mean([True, False, False, ..., True])

# In Numpy functions,
# True  counts as 1
# False counts as 0

# This is then equivalent to calculating
np.mean([1, 0, 0, ..., 1])

# Example:
# There are 10000 total elements
# 5000 elements equal to 4
np.mean(a==4)
=
5000 / 10000 = 0.5 or 50% probability
``````
6 Likes

what is the exact mathematics behind np.random.binomial function?
i understood the part it has given some particular result. I am curious to know how exactly python works in background.

2 Likes

In this exercise - Why does ““print(no_emails)”” and ““print(b_test_emails)”” give 0.0 as results in the terminal?

3 Likes

My guess is that there’s simply no chance any of these combinations would be possible

Regarding `no_emails`, it is due to the very small probability that `emails` contains at least one `0`. If the probability that a recipient of one email open it is `0.05`, then the probability of sending 500 emails and being opened by nobody is `(1 - 0.05) ** 500`. Therefore, the probability that `0` will appear even once in 10000 trials is `1 - (1 - (1 - 0.05) ** 500) ** 10000`. If you calculate this, it will be about `0.000000073`.

``````zero_out_of_500 = (1 - 0.05) ** 500
p = 1 - (1 - zero_out_of_500) ** 10000
print(p)  # 7.2745140578e-08
``````

This is the probability of 7.3 out of 100,000,000 clicks on ‘Run’.

Regarding `b_test_emails`, isn’t it around `0.002` to `0.003`?

Hello there,

Could someone tell me how to find the probability of a point in an interval, like this?
prob_more68 = np.mean(Height_girls>=68)
prob_less70 = np.mean(Height_girls<=70)

Numpy does not allow me to include an & and write it in one line, like this:

prob = np.mean(Height_girls>=68 & Height_girls<=70)

Thanks,
K

``````prob = np.mean((Height_girls>=68) & (Height_girls<=70))
``````
1 Like