How is the "five-number-summary" helpful to understand a dataset?


In the context of this exercise, how is the “five-number-summary” helpful to understand a dataset?


The five-number summary consists of these values for a dataset:

1st quartile
3rd quartile

With these values, we are able to get a stronger understanding of the dataset as a whole.

With this summary, we can compute the Interquartile Range, or IQR, which is computed as the difference of the 3rd quartile and the 1st quartile. This value tells us how spread the data is. If this value is low, then the data has little variance, and conversely, if the value is large, the data has a large variance.

The five-number-summary can also be helpful to determine outliers, by understanding the range of data by knowing the maximum and minimum values.

The summary is sometimes more reliable than depending on just the mean, especially when the data is skewed, because the mean is affected by outliers, while the quartiles and the median are not as affected by outliers.

We can also compare the five-number-summaries of different datasets to see how similar or different they are.


Low, large, in relation to what? To the median?

If this is the case, would it be right if we alternatively said sth like " the closer IQR is to the median, the less the variance is …" ?

1 Like

I believe it is low, large or (minimun or maximun) relative to the orderly dataset .

The dataset is sorted before any evaluation