How are the frequency values calculated for the histogram?



In the context of this exercise, how are the frequency values calculated for the histogram?


For the histogram that you made in this exercise, it is normalized, meaning that the total area under the histogram will add up to 1. With this in mind, we can figure out how the frequencies, or y-axis values, were calculated by doing the following:

First, we need to know the actual width, or range of values, of each individual “bin” of the histogram. The bin width is calculated by the difference of the maximum value minus the minimum value of the dataset, divided by the number of bins we chose. The equation can be seen as follows

bin_width = (max_value - min_value) / number of bins

Next, we just need to ensure that the total area of all the bins added together sums to 1, since this is a normalized histogram. We can do this by adding the area of each bin together, such that the sum is equal to 1. Also, one additional value we now need is a “ratio” such that it is ensured that the total area adds up to 1, due to normalization. We will see this come into play later on.

The equation can be seen as follows, where the count variables are the number of values that fall in each bin.

1 = (bin_width * ratio * count_1) + (bin_width * ratio * count_2) + ...

We can simplify this to

1 = bin_width * ratio * (count_1 + count_2 + ...)

and then shorten it to this,

1 = bin_width * ratio * N

where N is the total number of elements in the dataset.

Finally, we must obtain this ratio value.

1 = bin_width * ratio * N

1 / bin_width = ratio * N

1 / (bin_width * N) = ratio

Now that we have the ratio, we can obtain the frequencies for each bin. To get the frequency, or y-axis value, of a bin, you would use the following equation,

frequency = items_in_bin * ratio