How do bigram and triam models improve naive Bayes?



In the final lesson for the naive Bayes classifier, it is suggested that using a bigram or trigram model makes the assumption of independence more reasonable. Please expand on this?


Recall that the key assumption of this model, the assumption that makes it “naive”, is the assumption that the words in a sentence are independent. What does this mean? This means that given any word in a sentence, we have no idea what the following word will be. This is clearly unrealistic. Language has a structure and so there’s a good chance that you can predict the word following some arbitrary string of words and can also choose the small set of words that follow some arbitrary word. We can see this in our auto-completing texts and emails.

So our assumption of independence on a word by word basis is a pretty bad assumption although it works well in many applications. Bigrams and trigrams give us an easy way to make this independence assumption a little more likely with little effort since there’s more variation in strings than single words.

So if our sentence is “This crib is great”, our original assumption was that these words were independent

“This”, “crib”, “is”, and “great”

For the bigram, we assume that these strings are independent

“This crib”, “crib is”, and “is great”.

and for the trigram we assume independence of

“This crib is” and “crib is great”