### Question

In the context of this exercise, how can we tell if a value best represents the majority of the data?

### Answer

One possible way to do this is as follows:

First remove the outliers from the dataset.

After removing the outliers, the mean can then be calculated with the remaining values for a more accurate value representing the majority of the data.

Then, we can check whether either value is closer to this mean, and whichever is closer can be thought to better represent the majority of the data.

5 Likes

Out of the context of this exercise, would it be appropriate in some cases to just use the mean that’s been calculated after removing the outliers as the value to represent the majority of the data? Or would that not work because then I am technically working with a different dataset?

4 Likes

It really comes down to what you are asking the data to show you, in a real-life scenario. Generally speaking, both median and mean do not mean much without being accompanied by interquartile range or variance, respectively, to show how the data is distributed around these values. I am not a statistician but I am a researcher in STEM and use them a lot: you never see mean or median reported without the variance/standard deviation/SEM or IQR

8 Likes

Mean provides the average data. If your dataset contains outliers then the mean will be definitely affected. So applying the mean for analyzing the dataset with outliers will not give a better analysis. Outliers must be removed from the dataset before applying the mean. However, the median is not affected much by outliers in the data because if the data is sorted median provides a central value and the outliers remain in the extreme positions of the dataset.

1 Like

I want to point out that the setup for the question says the company wants to know the average amount of time someone spends on the website. To answer that specific question you would use the mean. If a customer or client asks for an answer, it is up to you to provide the answer and the median is not the answer whether it is the arguable best answer or not.

2 Likes

Yes 2.0 is the middle value the rest 50% is above it also, I also think mean is giving a good answer to the question, even with the series of outliers.