Machine Learning

In the first lesson of Machine Learning, to check all the words which were used to train the system, I entered print(text_counter.vocabulary_), and there were around 1000’s of words with numbers in the output window.
I want to know, for training a system with a huge data set, how does the huge data set gets generated in Supervised Learning?

Hey @shashankk99, welcome to the forum. :slight_smile:

By definition, in cases where you’re using an unsupervised machine learning algorithm there is no training data. An unsupervised algorithm is useful for processing data sets where a human analyst can’t see a pattern or relationship in the data, or to discern ones we might not otherwise notice.

Training data for a supervised learning algorithm, on the other hand, could be any data set which has a clearly defined relationship - that is to say, for each data point there’s a clearly defined or “correct” output that the algorithm should produce. As a basic example of training data, if you had a collection of pictures of animals that you’d labelled to indicate the animal depicted along with its distinguishing features you could use this to train a supervised learning model to recognise animals.