In the context of this exercise, how can outliers affect the classifications?
Outliers can affect the classifications in a negative way, because of the sensitivity of K-Nearest Neighbors to them.
As explained in the exercise text, even a single outlier can cause problems if the value of
k is very small, like
k=1, because any point near the outlier will be more influenced by it.
One reason why outliers are so impactful is that the K-Nearest Neighbors technique is completely dependent upon the input data. Outliers in the input data can impact the boundaries of classification because points that fall near to them can be classified differently than expected.
To avoid these issues caused by outliers, it can be a good idea to try and remove them initially. Another thing you can do is choose higher values of k, larger than 1, but not too large, because this can cause underfitting. By choosing a good value of k, it can still remain accurate even despite possible outliers, because it will not only take into account the outlier, but also the surrounding neighbor points.