Im currently working on Unsupervised K Means Clustering with the OKCupid dataset. After 5 hours of waiting, I have produced a k clusters vs inertia graph for the interval k = [0, 200].
OKCupid Unannotated graph:
OKCupid Annotated with line:
From Codecademy lessons, I know that I am supposed to choose the elbow point, which is the point where the graph starts to become linear. If I follow it strictly, that point seems to be around k = 100.
However, for k means clustering with the iris flower dataset, the known correct number of clusters is 3, corresponding to the three subspecies, which is the point just before the graph becomes linear.
Iris K means evaluation annotated with line:
So, Im wondering, should I choose k = 75, k = 100, or somewhere around or between those values?
P.S. Also, I just realized that the elbow point is actually just the inflection point, which we learn in calculus is the root of the second derivative of a function, f’’(0)