FAQ: K-Nearest Neighbors - Using sklearn


This community-built FAQ covers the “Using sklearn” exercise from the lesson “K-Nearest Neighbors”.

Paths and Courses
This exercise can be found in the following Codecademy content:

Data Science

Machine Learning

FAQs on the exercise Using sklearn

Join the Discussion. Help a fellow learner on their journey.

Ask or answer a question about this exercise by clicking reply (reply) below!

Agree with a comment or answer? Like (like) to up-vote the contribution!

Need broader help or resources? Head here.

Looking for motivation to keep learning? Join our wider discussions.

Learn more about how to use this guide.

Found a bug? Report it!

Have a question about your account or billing? Reach out to our customer support team!

None of the above? Find out where to ask other questions here!

In the last practical where the library is used instead of us writing our own codes (K nearest neighbors), if we apply the same dataset of the movie, how will we store the name of the movie? Because in the last practical we are using another dataset. Would be great if we can reuse the same example throughout. Many Thanks!

1 Like

We used dictionaries in previous exercises, but apparently the .fit() method (and probably the .predict() method too) in this exercise doesn’t seem to accept dictionaries as arguments:

classifier2 = KNeighborsClassifier(n_neighbors = 5)
movie_dic_dataset = {'title1': [.1, .2, .3], 'title2': [.4, .5, .6]}
movie_dic_labels = {'title1': 0, 'title2': 1}

# This line raises a TypeError
classifier2.fit(movie_dic_dataset, movie_dic_labels)

So we can’t use dictionaries to the .fit() method as it is, and we need to convert them to lists or arrays:

movie_dataset2 = []
movie_labels2 = []
for title in movie_dic_dataset:

classifier2.fit(movie_dataset2, movie_labels2)

If we want to store other data like movie titles, Pandas DataFrames might match better with scikit-learn than dictionaries. It seems that we can pass Pandas DataFrame and Series directly to the .fit() method.

import pandas as pd

movie_df = pd.DataFrame([
  ['title1', .1, .2, .3, 0],
  ['title2', .4, .5, .6, 1]
], columns=['title', 'budget', 'run time', 'year of release', 'label'])

classifier2.fit(movie_df[['budget', 'run time', 'year of release']], movie_df['label'])