How can we determine if a value is a outlier in a dataset?



How can we determine if a value is an outlier in a dataset?


Sometimes, we can tell that a value is an outlier in a dataset just by seeing that it appears much larger or smaller than the other values.

However, this might not be the most accurate method of determining an outlier. In order to check if a value is truly an outlier, we can use calculations to determine a range of values, such that if any value falls outside the range, it is an outlier.

This post will not go into detail on quartiles, but provide the calculations if those values determined.

The range of values is calculated like so:
[first_quartile - 1.5 * IQR, third_quartile + 1.5 * IQR]

where IQR stands for “Interquartile Range”, which is the difference between the first and third quartiles. Any values less than the start value, or greater than the end value will be an outlier.

These values will be covered in more detail later in the lesson, which you can then utilize to determine outliers in other datasets.