FAQ: Introduction to Data Science - Statistics

This community-built FAQ covers the “Statistics” exercise from the lesson “Introduction to Data Science”.

Paths and Courses
This exercise can be found in the following Codecademy content:

Code Foundations

FAQs on the exercise Statistics

Join the Discussion. Help a fellow learner on their journey.

Ask or answer a question about this exercise by clicking reply (reply) below!

Agree with a comment or answer? Like (like) to up-vote the contribution!

Need broader help or resources? Head here.

Looking for motivation to keep learning? Join our wider discussions.

Learn more about how to use this guide.

Found a bug? Report it!

Have a question about your account or billing? Reach out to our customer support team!

None of the above? Find out where to ask other questions here!

in the contrast of these excercise can someone please explain the percentile and the given under happy (un) equal pay day

Hi gursewak_marahar,

It compares the average income of men and women by percentile earnings (high income 90th vs low income earners 10th).

Percentile divides the data into equal parts of 100.

Mean changes shifts the Gaussian curve either left or right.
Standard deviation changes will change the height and and width of the Gaussian curve.

Want to comment on the importance of organizing data when it comes to the gender pay gap chart presented in this section. I’ve seen things like this before and I think there is likely errors in data organization and interpretation; not comparing apples to apples

For anyone who would like some insight into the data presented regarding the gender pay gap, I suggest looking into Warren Farrell’s work, who is a gender studies major and has done some amazing work in support of second wave feminism. He had daughters of his own, and talk about the gender pay gap was a concern for his girls’ future, so he started digging. What he found was interesting.

What you start to find is, as you get more and more specialized, the gap gets smaller and smaller. For example, it’s often noted that male doctors get paid more than female, and at first glance one would say “look! that’s the same work and there is a 15-20% gap! outrage!” But comparing doctors to doctors isn’t always apples to apples. An optometrist gets paid more than a general practitioner, and when you look at the data a higher percentage of women who are doctors are general practitioners when compared to men, and as things get more and more specialized (optometrists can also be a highly specialized retinal surgeon rather than just an general optometrist) a higher percentage of men occupy increasingly specialized fields than women, and the gap in pay really starts to disappear.

Farrell explored about 13 reasons why men earn more (which, btw, usually does not lead to better lives for men on the aggregate) including:

-willingness to work longer hours in their late 20s, 30s & 40s
-willingness to take hazardous jobs that women will not
-Women having a better focus on a balanced lifestyle
-women’s preference for types of work that pay less (teaching, caregiving)
…and more

So these kinds of subtleties are important when trying to get more accurate interpretations of the data. When it comes to this chart I would be very interested to see how the data was cleaned and organized.

We also ought to be careful how we present incomplete representations of data to women as I fear it is doing them no good and might cultivate a victim mentality. Cathie Wood, a role model of mine and likely (guessing) the most important person in investing in the 21st century, is big on women “self selecting”. In the ARK fund talent is self selecting and she mentions in the video below how she wishes more women would self select. I love Cathie Wood btw, she is such an inspiration!

4 Likes

I wish the course designers would have considered this when they made this course. The intro section and this section frame the entire course as being focused on certain pieces of information that are based in falsehoods and misrepresentations. Three of the book recommendations are discredited works on why data science perpetuates mythical inequalities, and if this course uses these arguments as a basis for its inferential analysis, it becomes useless in teaching that concept. Inferential analysis requires an understanding of the big picture, not selecting specific data points and a pre-ordained conclusion and working from that basis (which is what each of these resources does), and is the exact opposite of performing competent scientific or statistical analysis.

In fact, this very example shows how not to perform inferential analysis. It uses a study by 538 that has been proven to use faulty data and works backwards from its intended conclusion, dishonestly trying to argue that the study is a key example of good data science.

5 Likes

Yes. It was very frustrating that these falsehoods were injected

2 Likes

Can somebody suggest good material to understand the difference between descriptive and inferential analysis and how mean and standard deviation affect the dataset

2 Likes

I really enjoyed reading your response on this topic. I think that not enough people really take time to critically think about what they are really seeing when they look at data. We should not believe everything that an author might say in a news article just because they site data. We need to take the time to look into how that data was gathered and complied. As you pointed out it can really change results.

1 Like