FAQ: Getting Started with Natural Language Processing - Topic Models

This community-built FAQ covers the "Topic Models" exercise from the lesson "Getting Started with Natural Language Processing".

Paths and Courses
This exercise can be found in the following Codecademy content:

Natural Language Processing

FAQs on the exercise Topic Models

Why do words in the stopwords list still appear in the Topics found by tf-idf LDA ?

Why does every run give a new, different set of topic words in both versions?

I’m a bit lost on this exercise, as well. Would it be possible for a moderator or instructional lead to break down what’s happening in the code? In previous exercises, those comments in the code have been incredibly helpful.

Hi everyone!
I am confused about how to read the trees. I can’t make much sense out of it. It just looks like a bunch of words and lines that follow no particular order. Are we going to be taught how to understand it later on in the course?

Also in an unrelated question:
I am a Jamaican. In Jamaica we speak an English Dialect called Patois. If I wanted to create a chatbot that was best suited for my Jamaican Patios speaking audience, would I follow these same rules, functions and models or would I have to do something else? Looking forward to hear your response!

Completely agree. The exercises are meant to help us understand how the tools work, but without additional instruction, I don’t really know what’s happening.

Hello! Because bag of words LDA topics based on filtered_for_stops while tf-idf LDA topics based on preprocessed_corpus. Also you can compare strings 41 and 47 and can find that different variables were used: bag_of_words_creator and tfidf_creator.