In the context of this exercise, is there an ideal bin width for histograms?
In regards to bin widths, there is no “one size fits all” solution to picking a perfect bin width for any dataset. It is highly dependent on the data and how you choose the range of values per bin.
Ideally, your bin widths will be able to give a meaningful representation of your data. One example is if your data is for age ranges of a poll, the bins might be 10 wide (10-20, 20-30, …), which can make a lot of sense.
Choosing too few or too many bins can take away from one’s understanding of the data.
Choosing too few bins will result in an inaccurate representation of the distribution. For example, if we only had 2 bins for the entire graph, we might end up with bars that don’t tell us much. For example, it would be like choosing age ranges of 10-50 and 50-90.
Choosing too many bins can end up giving us many empty spaces and cause the bar to not appear smooth. This is like choosing age ranges of 10-12, 12-14, 14-16, …, 88-90, which is probably not be necessary.
There do exist complex formulas that can be used to get a more precise bin size for a dataset, such as Sturge’s formula, Doane’s formula, or the Freedman-Diaconis rule, but these are beyond the scope of the course.