In this exercise, we learned about the first step in K-Means Clustering, which is the placement of
k random centroids for the initial clusters. Is it possible for more than one of these centroids to be generated at the same point?
Yes, this is possible due to the nature of randomization, but it is very, very unlikely. In some cases, you may even implement the algorithm so that any taken positions cannot be taken by another centroid.
In the rare chance that this does happen, it will not have too much of a consequence. The entire process of K-Means is usually run multiple times, so on different executions, the centroids will initially be placed at different locations. This is done so that it can choose the most accurate of the tests and reduce error.
In the same exercise we’re identifying centroids randomly after visualising the data. Since we’re looking at the scatter plot and a couple of clusters are already emerging, why aren’t we using this visual info to determine three centroids manually as our starting points?
In this exercise, they just want to help you imagine the way the algorithm generates random centroids when it runs. We cannot using the visual info to determine three centroids manually as our starting points because all the K-Means process is done by a computer and it cannot see the three clusters as we see
However, the 3 clusters that we see might be wrong in other circumstances where there are many clusters. We cannot handle the centroids without the calculation power of the computer.
Hope this help