What do the terms in Bayes' Theorem mean? A specific example


In this lesson, we are are presented with Bayes’ Theorem as follows

P(positive ∣ review) = (P(review | positive)⋅P(positive))​ / P(review).

What do these terms mean in plain language?


Let’s first begin with the left-hand side: P(positive ∣ review). This is the probability that a provided review is positive. This is our goal in these lessons.

Now let’s look at the numerator on the right-hand side: P(review | positive)⋅P(positive). As we’re told in this lesson, P(positive) is the probability that any review in our data set is positive. The term P(review | positive) is the probability that each of the words in the review appear in some positive review in our data set, given that we already know the review is a positive review.

Finally, let’s look at the denominator of the right-hand side: P(review). This can be looked at as the probability that the words in the review appear somewhere in our data set. Of course, since we’re dividing by this value, we should check that this value is not zero. What would it mean for P(review) to equal zero? Well, that would tell us that none of the words in the provided review appear in our data set. This is a problem because, as written, our algorithm won’t be able to classify this review in any way. Therefore, it’s “good” for our program to fail here as it provides an understanding of how our data set may be lacking.