How does np.mean() give us the probability?


#1

Question

In the context of this exercise, how does np.mean() give us the probability?

Answer

The np.mean() function is generally used to get the average of all values in a dataset. We can apply this function with a logical statement to get the percent of values that satisfy the logical statement.

In the exercise example, we have np.mean(a==4). First, this will evaluate the conditional, a==4, which will return a list of True and False values. Then it will run np.mean() on that list of True and False values. When running np.mean() on a list of True and False values, True = 1, False = 0 during the calculation.

Because True values count as 1, this is like counting how many elements satisfy the logical statement, and the calculation is essentially:

(Number elements satisfying condition) / (Number total elements)

Example

# a = [4, 3, 1, ..., 4]
np.mean(a==4)
=
np.mean([True, False, False, ..., True])

# In Numpy functions, 
# True  counts as 1
# False counts as 0

# This is then equivalent to calculating
np.mean([1, 0, 0, ..., 1])

# Example:
# There are 10000 total elements
# 5000 elements equal to 4
np.mean(a==4)
= 
5000 / 10000 = 0.5 or 50% probability