Intro to Classification with K-Nearest Neighbors -- Which Algorithm?

I noticed when reading the scikit-learn documentation that the scikit KNeighborsClassifier selects from a variety of algorithms for its model: ’ball_tree’, ‘kd_tree’ , ‘brute’ and ‘auto’. For the data scientist path, for the intro to classification lesson, which model did we construct during the lesson? I’m not sure if this was clear during the lesson.

I’ve pasted the section of the documentation I’m referring to below:

algorithm {‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}, default=’auto’

Algorithm used to compute the nearest neighbors:

  • ‘ball_tree’ will use BallTree
  • ‘kd_tree’ will use KDTree
  • ‘brute’ will use a brute-force search.
  • ‘auto’ will attempt to decide the most appropriate algorithm based on the values passed to fit method.

You can determine which parameters were used when building your model by using the .get_params() method like this:

print(classifier.get_params())

In this case, the ‘auto’ algorithm was used.

1 Like

Got it, thank you! Since ‘auto’ is being used, does that mean it will select either ‘ball_tree’, ‘kd_tree’, or ‘brute’, depending on the values passed to fit?

Exactly.
You can always compare the predictions you get using the three different specified algorithms if you’d like to determine how they differ for your dataset. If you have labels for the test data (y), you can use the .score() method to get a quantitative comparison of how the algorithms perform against each other. In the intro to classification lesson, these labels are not provided.

Great, that clears that up. Thanks Erin!