FAQ: K-Means Clustering - Implementing K-Means: Step 2

This community-built FAQ covers the “Implementing K-Means: Step 2” exercise from the lesson “K-Means Clustering”.

Paths and Courses
This exercise can be found in the following Codecademy content:

Data Science

Machine Learning

FAQs on the exercise Implementing K-Means: Step 2

Join the Discussion. Help a fellow learner on their journey.

Ask or answer a question about this exercise by clicking reply (reply) below!

Agree with a comment or answer? Like (like) to up-vote the contribution!

Need broader help or resources? Head here.

Looking for motivation to keep learning? Join our wider discussions.

Learn more about how to use this guide.

Found a bug? Report it!

Have a question about your account or billing? Reach out to our customer support team!

None of the above? Find out where to ask other questions here!

Distances to each centroid

for i in range(len(samples)):
for j in range(k):
distances[j] = distance(sepal_length_width[i],centroids[j])
cluster = np.argmin(distances)
labels[i] = cluster

Codecademy’s solution should probably include that second for loop for k, since everything else is based on a variable number of centroids.

Also, an alternative solution to using np.zeros is to just do [0] * k, or [0] * len(samples).

1 Like

Hello!!

I bagged for help, I am feeling a bit frustrated about this result… I tried a different approach, I created a list ‘labels’, I appended the cluster to this list and then transformed it to np.array. But the format of the labels differ from the exercise results. I want to know how to match my approach labels format to the correct format.

Thank you in advance for any help!

My code:

labels = []

for i in sepal_length_width:
  distances = []
  distances.append(distance(i, centroids[0]))
  distances.append (distance(i, centroids[1]))
  distances.append(distance(i, centroids[2]))
  cluster = np.argmin(distances)
  labels.append(cluster)

distances = np.array(distances)
labels = np.array(labels)

My labels format:

[0 1 1 1 0 0 1 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 0 0 0 1 0 0
 0 1 0 0 1 1 0 0 1 0 1 0 0 2 0 2 1 2 0 0 1 2 1 1 0 2 0 0 2 0 0 2 1 0 0 2 0
 2 2 2 2 0 0 1 1 0 0 0 0 2 2 0 1 1 0 0 1 0 0 0 0 1 0 0 0 2 2 2 2 1 2 2 2 2
 2 2 1 0 0 2 2 2 2 2 0 2 2 2 2 2 0 2 2 2 2 2 2 2 2 0 2 0 2 2 2 0 2 2 2 2 2
 0 0]

<class 'numpy.ndarray'>

Solution format

[2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2.
 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2.
 2. 2. 0. 2. 0. 2. 1. 2. 2. 2. 1. 2. 2. 2. 2. 2. 2. 1. 2. 2. 1. 2. 2. 2.
 1. 2. 1. 1. 1. 1. 2. 2. 2. 2. 2. 2. 2. 2. 1. 1. 2. 2. 2. 2. 2. 2. 2. 2.
 2. 2. 2. 2. 2. 2. 0. 2. 1. 0. 2. 0. 1. 0. 1. 1. 1. 2. 2. 2. 1. 0. 0. 2.
 0. 2. 0. 1. 0. 0. 2. 2. 1. 0. 0. 0. 1. 1. 2. 0. 2. 1. 2. 0. 1. 0. 2. 0.
 0. 1. 1. 1. 2. 2.]
<class 'numpy.ndarray'>
1 Like

I was confused by a similar issue. Apparently Step 4 requires us not only to make labels a Numpy array, but also to make its dtype float64 (which is the default dtype of np.zeros()).

labels = np.array(labels, dtype='float64')
1 Like