[K-Means Clustering] Which point should be the elbow point in my k vs inertia graph?

Im currently working on Unsupervised K Means Clustering with the OKCupid dataset. After 5 hours of waiting, I have produced a k clusters vs inertia graph for the interval k = [0, 200].

OKCupid Unannotated graph:
K_Clusters_evaluation_2
OKCupid Annotated with line:
K_Clusters_evaluation_annotated
From Codecademy lessons, I know that I am supposed to choose the elbow point, which is the point where the graph starts to become linear. If I follow it strictly, that point seems to be around k = 100.

However, for k means clustering with the iris flower dataset, the known correct number of clusters is 3, corresponding to the three subspecies, which is the point just before the graph becomes linear.

Iris K means evaluation annotated with line:
iris_annotated

So, Im wondering, should I choose k = 75, k = 100, or somewhere around or between those values?

P.S. Also, I just realized that the elbow point is actually just the inflection point, which we learn in calculus is the root of the second derivative of a function, f’’(0)

1 Like

I can’t answer your question, but I’m about to be enrolled in Pre-Calc II, so I’m pretty excited to get into Calculus to see what is you are talking about with root of the second derivative of a function and such.

The elbow method doesn’t always work well for choosing the correct number of clusters (such as when the data aren’t very clustered). This is why various methods have been developed for cluster number selection (such as the silhouette method). In the case of the OK Cupid graph, I don’t believe that there is a single “right” answer that you could arrive at through visual inspection. Try a few different Ks based on the graph and compare your results. What makes the most sense?

Hope this helps

3 Likes