In the naive Bayes lesson, when computing
review = "This crib was amazing", the exercise states that
In other words, if we assume that the review is positive, what is the probability that the words “This”, “crib”, “was”, and “amazing” are the only words in the review? To find this, we have to assume that each word is conditionally independent.
Why must we make this assumption?
To understand why we choose the naive assumption (why we assume that each word is conditionally independent) let’s take a closer look at the probability
P(review | positive). Computing this probability is not very simple. Because we’re considering each word in the review,
review = "This crib was amazing" will expand to equal
P("this" | "crib", "is", "amazing", positive) * P("crib" | "is", "amazing", positive) * ... * P("amazing" | positive)
This can get quite messy. Assuming that we have conditional independence makes the above computation equivalent to
This is because, as we can recall, if
B are independent then
P(A|B) = P(A) and vice versa.
So to conclude, we make this “naive” assumption because it makes our computations much easier to compute and reason about.