In the context of this exercise, how is the “five-number-summary” helpful to understand a dataset?
The five-number summary consists of these values for a dataset:
With these values, we are able to get a stronger understanding of the dataset as a whole.
With this summary, we can compute the Interquartile Range, or IQR, which is computed as the difference of the 3rd quartile and the 1st quartile. This value tells us how spread the data is. If this value is low, then the data has little variance, and conversely, if the value is large, the data has a large variance.
The five-number-summary can also be helpful to determine outliers, by understanding the range of data by knowing the maximum and minimum values.
The summary is sometimes more reliable than depending on just the mean, especially when the data is skewed, because the mean is affected by outliers, while the quartiles and the median are not as affected by outliers.
We can also compare the five-number-summaries of different datasets to see how similar or different they are.