In the context of this exercise, wouldn’t large values overtake smaller ones?
Yes, currently, if we utilize the distance formula without taking into account the type of features and scales they are on, then large values will likely overtake the smaller ones.
For instance, say that our dataset of movies included two features: budget and year of release. The value of a movie’s budget might be around the millions, say 1,000,000 dollars, while the release year is much smaller, say 2010. It is clear that the values of the budgets will overtake the values of the years, since their scales are so different.
To address this issue, later in the lesson, you will learn how to make it so that each of the values is treated similarly, using a technique called normalization.