FAQ: K-Nearest Neighbors - Classify Your Favorite Movie

This community-built FAQ covers the “Classify Your Favorite Movie” exercise from the lesson “K-Nearest Neighbors”.

Paths and Courses
This exercise can be found in the following Codecademy content:

Data Science

Machine Learning

FAQs on the exercise Classify Your Favorite Movie

Join the Discussion. Help a fellow learner on their journey.

Ask or answer a question about this exercise by clicking reply (reply) below!

Agree with a comment or answer? Like (like) to up-vote the contribution!

Need broader help or resources? Head here.

Looking for motivation to keep learning? Join our wider discussions.

Learn more about how to use this guide.

Found a bug? Report it!

Have a question about your account or billing? Reach out to our customer support team!

None of the above? Find out where to ask other questions here!

Hi!

The budget numbers in movie_dataset seem to be not normalized w.r.t. the currency. Even an expensive movie with a budget of 250,000,000 US$ has a normalized budget of around 0.02. Exploring the dataset, there’s only one movie with a normalized budget of > 0.5, titled “The Host”. This seems to be a Korean movie from 2006 with a budget of ₩11.8 billion (which would be just 11 million US$).

Many thanks!

1 Like

hello! for the normalize_point function, is the point being normalized the same way we previously normalized the dataset? as in, is the same maximum and minimum normalization being used?

thanks :slight_smile:

1 Like

Hello unabletosearch!

Yes it is just min-max, I tried it for the dataset movie_dataset which you can get by just printing it. With the help of some calculus and budget, runtime and year of two movies (imdb), you can calculate the following values (i hope codeacademy won’t mind :smiley:):

min_budget = 218.00000013490643
max_budget = 12215499999.999994
min_runtime = 36.99999999999991
max_runtime = 330.00000000000006
min_year = 1926.999999999997
max_year = 2016.0000000000005

Edit: Codeacademy uses:
min_budget = 218
max_budget = 12215400000
min_runtime = 37
max_runtime = 330
min_year = 1927
max_year = 2016

Now you just have to min-max the budget, runtime and year of your movie seperately but as learned before in the KNN course.

If you want to know more about the math (which is pretty simple) just ask :wink:

Best regards,
Vince

1 Like

So. I tried the K-nearest with “Rogue One” a movie that has a 7.8 imdb rating and the result was a 0 (bad movie). Maybe the dataset isn’t big enough or there were so many bad movies with big budgets and 230 minutes long that year (2016).

I tried with a for loop using range() from 2 to 20 and only with a k of 2 and 3 the movie was classified as good. Every single k after was classified as bad.

Im not able to find movies library used in these courses anywhere. How do i access this movies module

Hello anuraagrath002,

you can go to the Machine Learning course, Finding the Nearest Neighbors and then go to page 6/13.
In the code section, print movie_dataset and then just copy-paste it from the output section to some text file. You can repeat that for movie_labels.

Of course you can then also edit the database and add new movies.

Thank you Vince. But i actually wanted that to use normalize_point. Im trying to use the function for Normalization from before, but it seems like i get the same results all the time.


i dunoo what to do

Hey anuraagrath002,

I am just now realizing, that I haven’t answered your last question in March 2021. At that time I took a pause from coding and I am sorry, that I haven’t responded at that time. I hope you were able to fix your problem.

Hello, could you plase share with us the normalize_point function ? thanks