In the context of this exercise, what does the visualization of the data tell us?
If we plot our current
iris_data onto a scatter plot, we can see how several data points seem to cluster around certain parts of the graph. By setting the
alpha value of the
.scatter() function to
0.5, the overlapping points on the plot appear darker, and we can see that there are several of these darker points concentrated around certain parts of the plot.
This tells us that there are data points with similar characteristics, which is shown in the concentrations of points, and that these similar points are most likely the same species of Iris plant.
Given that there are 3 species of Iris plants in our dataset, at a glance, you might be able to divide the plot into the following 3 clusters,
However, without the initial information that there were exactly 3 species, it is possible that we might have assumed two, or maybe even more than three clusters.