We learn here that in order to use
scikit-learn for naive Bayes, we must use the
CountVectorizer object. What is this object and how does it differ from the method we used to implement naive Bayes with Python’s
When we implemented naive Bayes with the
Counter data structure, we needed to first prepare the text by dividing the positive and negative reviews into a list of unique words. We then pass this prepared list to the
Counter data structure. Scikit-learn enables us to do both these processes in one step with the
CountVectorizer object. It implements both tokenization (that is, diving our text documents unique words or tokens) and occurrence counting in a single class. In this way, it makes our work easier.