FAQ: K-Nearest Neighbors - Classify Your Favorite Movie

This community-built FAQ covers the "Classify Your Favorite Movie" exercise from the lesson "K-Nearest Neighbors".

The budget numbers in movie_dataset seem to be not normalized w.r.t. the currency. Even an expensive movie with a budget of 250,000,000 US$ has a normalized budget of around 0.02. Exploring the dataset, there’s only one movie with a normalized budget of > 0.5, titled “The Host”. This seems to be a Korean movie from 2006 with a budget of ₩11.8 billion (which would be just 11 million US$).

hello! for the normalize_point function, is the point being normalized the same way we previously normalized the dataset? as in, is the same maximum and minimum normalization being used?

Hello unabletosearch!

Yes it is just min-max, I tried it for the dataset movie_dataset which you can get by just printing it. With the help of some calculus and budget, runtime and year of two movies (imdb), you can calculate the following values (i hope codeacademy won’t mind :smiley:):

min_budget = 218.00000013490643
max_budget = 12215499999.999994
min_runtime = 36.99999999999991
max_runtime = 330.00000000000006
min_year = 1926.999999999997
max_year = 2016.0000000000005

Edit: Codeacademy uses:
min_budget = 218
max_budget = 12215400000
min_runtime = 37
max_runtime = 330
min_year = 1927
max_year = 2016

Now you just have to min-max the budget, runtime and year of your movie seperately but as learned before in the KNN course.

If you want to know more about the math (which is pretty simple) just ask :wink:

