Question
In the context of this exercise, how does np.mean()
give us the probability?
Answer
The np.mean()
function is generally used to get the average of all values in a dataset. We can apply this function with a logical statement to get the percent of values that satisfy the logical statement.
In the exercise example, we have np.mean(a==4)
. First, this will evaluate the conditional, a==4
, which will return a list of True
and False
values. Then it will run np.mean()
on that list of True
and False
values. When running np.mean()
on a list of True
and False
values, True = 1, False = 0
during the calculation.
Because True
values count as 1
, this is like counting how many elements satisfy the logical statement, and the calculation is essentially:
(Number elements satisfying condition) / (Number total elements)
Example
# a = [4, 3, 1, ..., 4]
np.mean(a==4)
=
np.mean([True, False, False, ..., True])
# In Numpy functions,
# True counts as 1
# False counts as 0
# This is then equivalent to calculating
np.mean([1, 0, 0, ..., 1])
# Example:
# There are 10000 total elements
# 5000 elements equal to 4
np.mean(a==4)
=
5000 / 10000 = 0.5 or 50% probability