How are the frequency values calculated for the histogram?

Question

In the context of this exercise, how are the frequency values calculated for the histogram?

Answer

For the histogram that you made in this exercise, it is normalized, meaning that the total area under the histogram will add up to 1. With this in mind, we can figure out how the frequencies, or y-axis values, were calculated by doing the following:

First, we need to know the actual width, or range of values, of each individual “bin” of the histogram. The bin width is calculated by the difference of the maximum value minus the minimum value of the dataset, divided by the number of bins we chose. The equation can be seen as follows

bin_width = (max_value - min_value) / number of bins

Next, we just need to ensure that the total area of all the bins added together sums to 1, since this is a normalized histogram. We can do this by adding the area of each bin together, such that the sum is equal to 1. Also, one additional value we now need is a “ratio” such that it is ensured that the total area adds up to 1, due to normalization. We will see this come into play later on.

The equation can be seen as follows, where the count variables are the number of values that fall in each bin.

1 = (bin_width * ratio * count_1) + (bin_width * ratio * count_2) + ...

We can simplify this to

1 = bin_width * ratio * (count_1 + count_2 + ...)

and then shorten it to this,

1 = bin_width * ratio * N

where N is the total number of elements in the dataset.

Finally, we must obtain this ratio value.

1 = bin_width * ratio * N

1 / bin_width = ratio * N

1 / (bin_width * N) = ratio

Now that we have the ratio, we can obtain the frequencies for each bin. To get the frequency, or y-axis value, of a bin, you would use the following equation,

frequency = items_in_bin * ratio
7 Likes

Hi team,

I have a small question regarding the frequency equation stated above. What does the items_in_bin refers to? When I read this post from top to bottom, I get disconnected when I reach this particular equation.

Explanation with a simple example would be very much appreciated.

Thanks,
Jimmy

We will need a link to the exercise for context. Please post it in a reply.

Hi there,

I dont think this question is linked to any exercise. This is just a general question about finding the frequency value (y axis) on histogram charts.

I guess one way to find out is to track down that exercise and see what its about. Could answer your question just in the doing.

Hi there,

the link is actually attached on the original post (first line) by jephos249.

I’m not the one that needs to follow up.

Well, I guess I will just have to wait for other Codecademy moderators or users to answer my question.

You’ve already said it was a general question but related it directly to a course. How is anyone who hasn’t taken the course going to be able to answer your question, especially if you haven’t taken the course, either? This is a non-question.

What does items_in_bin specify in the formula :frequency = items_in_bin * ratio.
Can anybody explain the formula stepwise with an example, if possible?

Thanks in advance!!

items_in_bin

It’s the number of values of a dataset that goes into each bucket (bar) of a histogram. Basically, these numbers determine the height of each individual bar of the histogram on the y-axis hence they show the frequency of occurrences of certain values in datasets.

ratio

is the ratio of the total area of the histogram to 1. Why to 1? Because when normalized, the total areas of all histograms at hand are reduced to 1. Meaning the total area value is divided by the value of itself, resulting in 1.

For example, if a total area’s value of one of our histograms is 50, then we should divide it to 50 to get 1.
If the total area’s value of the second histogram is 30 then divide it to 30 to get 1. When both histograms’ total areas are reduced to 1 (meaning normalized) we can finally compare the number of occurrences of certain dataset values, because they are now all located within the same “coordinate system” - within 1.

In this coordinate system if there were 5 numbers with a value of 3 in the dataset the total histogram area of which is 50, then we should divide the 5 to 50 to get the fraction of the whole (literally fraction of 1) that these 5 numbers occupy in the dataset. It’s equal to 0.1. The same we do for the second histogram. Let’s say we have 15 numbers with a value of 3 in the second dataset then we should divide 15 to 30, it’s equal to 0.5.

We already can see that 0.5 > 0.1 but we can even multiply these by 100 to get the percentage ratio, which gives us 10% and 50%. These exact percentages are the heights of 2 separate bars (out of many) from the 1st histogram and the 2nd respectfully, behind them are hidden our items_in_bin values.

See if this helps: How are the frequency values calculated for the histogram? - #11 by fox.trot