Explaining the terms within Bayes' theorem for the naive Bayes classifier



With the naive Bayes classifier, we are finding the probability that a data point d is a member of a class C using Bayes’ theorem: P(C|d) = P(C)*P(d|C) / P(d). We can determine P(C) because this model is supervised and we have data. But what is P(d)? How do we find the probability of a data point?


Because we’re using naive Bayes to classify text documents, for example reviews, let’s consider what P("I love this product!") means. Since naive Bayes is supervised, we have a database of correctly classified reviews that we want to check “I love this product!” against. The probability P("I love this product!") is giving us the chances that the review is already in the database. For instance, we may already have reviews which are very similar to this review. This value can give us a hint as to the accuracy of our classification.