In the context of this lesson and Numpy, how is knowing statistics helpful in data analysis?
Statistics helps us to understand datasets better. Using statistics, we are able to infer, with some certainty, how likely something might occur, and also learn other properties about the data.
Statistics is used in many large software companies for data analysis. For example, social networks utilize statistics from data obtained from user habits, such as their click-through rate, which is helpful to determine what advertisements or posts to show on their feed. Online e-commerce sites use statistics with user data to infer what a customer might buy next based on their purchase history.
Knowing the statistical properties of a dataset can tell us some important information about the data.
The mean gives us a good idea of the value around which most of the values in the dataset fall around. Percentiles let us know values that certain percentages of the data fall below.
The Interquartile Range helps us see how spread the data is, based on the middle 50% of the data (between the 25th and 75th percentiles). Outliers tell us how much the data varies by, and can help us see possible errors in our observations or tests. And last but not least, the Standard Deviation tells us how much the data is spread from the mean.