Why is the "naive", independence, assumption necessary for the naive Bayes classifier?



In the naive Bayes lesson, when computing P(review|positive) for review = "This crib was amazing", the exercise states that

In other words, if we assume that the review is positive, what is the probability that the words “This”, “crib”, “was”, and “amazing” are the only words in the review? To find this, we have to assume that each word is conditionally independent.

Why must we make this assumption?


To understand why we choose the naive assumption (why we assume that each word is conditionally independent) let’s take a closer look at the probability P(review | positive). Computing this probability is not very simple. Because we’re considering each word in the review, P(review|positive) for review = "This crib was amazing" will expand to equal

P("this" | "crib", "is", "amazing", positive) * P("crib" | "is", "amazing", positive) * ... * P("amazing" | positive)

This can get quite messy. Assuming that we have conditional independence makes the above computation equivalent to

P("this"| positive)*P("crib"|positive)*P("is"|positive)*P("amazing"|positive)

This is because, as we can recall, if A and B are independent then P(A|B) = P(A) and vice versa.

So to conclude, we make this “naive” assumption because it makes our computations much easier to compute and reason about.

FAQ: Naive Bayes Classifier - Bayes Theorem II