What is happening for step 2 of K-Means clustering?


#1

Question

In the context of this exercise, describing step 2 of K-Means clustering, what is happening in this step?

Answer

In step 2 of K-Means clustering, it is basically classifying the data samples based on the nearest centroid.

To determine the nearest centroid for a data sample, we utilize the distance formula, which is essentially the Pythagorean theorem. Given the data point and a centroid, we obtain the distance using a formula similar to the following,

delta_x = data_point.x - centroid.x
delta_y = data_point.y - centroid.y

distance = sqrt(delta_x**2 + delta_y**2)

In our code for this exercise, we check the distance from the data point to each of the centroids, and then choose the one that is nearest, utilizing the np.argmin() function.