This community-built FAQ covers the “Implementing K-Means: Step 2” exercise from the lesson “K-Means Clustering”.
Paths and Courses
This exercise can be found in the following Codecademy content:
Data Science
Machine Learning
FAQs on the exercise Implementing K-Means: Step 2
Join the Discussion. Help a fellow learner on their journey.
Ask or answer a question about this exercise by clicking reply (
) below!
Agree with a comment or answer? Like (
) to up-vote the contribution!
Need broader help or resources? Head here.
Looking for motivation to keep learning? Join our wider discussions.
Learn more about how to use this guide.
Found a bug? Report it!
Have a question about your account or billing? Reach out to our customer support team!
None of the above? Find out where to ask other questions here!
Distances to each centroid
for i in range(len(samples)):
for j in range(k):
distances[j] = distance(sepal_length_width[i],centroids[j])
cluster = np.argmin(distances)
labels[i] = cluster
Codecademy’s solution should probably include that second for loop for k, since everything else is based on a variable number of centroids.
Also, an alternative solution to using np.zeros is to just do [0] * k, or [0] * len(samples).
1 Like
Hello!!
I bagged for help, I am feeling a bit frustrated about this result… I tried a different approach, I created a list ‘labels’, I appended the cluster to this list and then transformed it to np.array. But the format of the labels differ from the exercise results. I want to know how to match my approach labels format to the correct format.
Thank you in advance for any help!
My code:
labels = []
for i in sepal_length_width:
distances = []
distances.append(distance(i, centroids[0]))
distances.append (distance(i, centroids[1]))
distances.append(distance(i, centroids[2]))
cluster = np.argmin(distances)
labels.append(cluster)
distances = np.array(distances)
labels = np.array(labels)
My labels format:
[0 1 1 1 0 0 1 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 0 0 0 1 0 0
0 1 0 0 1 1 0 0 1 0 1 0 0 2 0 2 1 2 0 0 1 2 1 1 0 2 0 0 2 0 0 2 1 0 0 2 0
2 2 2 2 0 0 1 1 0 0 0 0 2 2 0 1 1 0 0 1 0 0 0 0 1 0 0 0 2 2 2 2 1 2 2 2 2
2 2 1 0 0 2 2 2 2 2 0 2 2 2 2 2 0 2 2 2 2 2 2 2 2 0 2 0 2 2 2 0 2 2 2 2 2
0 0]
<class 'numpy.ndarray'>
Solution format
[2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2.
2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2.
2. 2. 0. 2. 0. 2. 1. 2. 2. 2. 1. 2. 2. 2. 2. 2. 2. 1. 2. 2. 1. 2. 2. 2.
1. 2. 1. 1. 1. 1. 2. 2. 2. 2. 2. 2. 2. 2. 1. 1. 2. 2. 2. 2. 2. 2. 2. 2.
2. 2. 2. 2. 2. 2. 0. 2. 1. 0. 2. 0. 1. 0. 1. 1. 1. 2. 2. 2. 1. 0. 0. 2.
0. 2. 0. 1. 0. 0. 2. 2. 1. 0. 0. 0. 1. 1. 2. 0. 2. 1. 2. 0. 1. 0. 2. 0.
0. 1. 1. 1. 2. 2.]
<class 'numpy.ndarray'>
1 Like
I was confused by a similar issue. Apparently Step 4 requires us not only to make labels
a Numpy array, but also to make its dtype
float64
(which is the default dtype
of np.zeros()
).
labels = np.array(labels, dtype='float64')
1 Like