Hello Everyone!
I have finished the US Medical Project and am looking for feedback. Here is the link to my gist: US Medical Insurance Project - Python Fundamentals · GitHub
The project took me about 4 hours.
Here is the feedback I am looking for:
My biggest issue with the project was thinking like “Data Scientist/Analyst.” Scoping the project is what was difficult for me.
I kept my analysis of the project simple because I did not want to go down the rabbit hole of trying to overdo it.
Please let me know yall’s thoughts!
Thanks in advance.
Congrats on completing the project! This is a data set that you’ll return to as you go on in the course and learn more DA skills (or, you could do your own separate analysis too, to try out new coding skills).
One thing…
For the 3rd calculation about who has more children, it might also help if you added some more info. Perhaps the total number of men & women (which is 676 and 662, respectively) as well as the total number of kids in the dataset.
Also, I think there might be a miscalculation b/c I came up with different numbers.
#this is using pandas
insurance["sex"].value_counts()
male 676
female 662
#the numbers of kids/rows, further:
insurance["children"].value_counts()
0 574
1 324
2 240
3 157
4 25
5 18
#764 kids total
If I break out the insurance data into separate data frames, by women & men:
women_only = insurance.iloc[(insurance['sex']=='female').values]
women_only['children'].value_counts()
0 289
1 158
2 119
3 77
4 11
5 8
#women who have children: 373
#Men:
men_only = insurance.iloc[(insurance['sex']=='male').values]
men_only["children"].value_counts()
0 285
1 166
2 121
3 80
4 14
5 10
#Men who have children: 391
More (b/c I was curious at the numbers of children per region that women vs. men had):
#women, children, region breakdown:
women_only[['children', 'region']].groupby('region').count()
children
region
northeast 161
northwest 164
southeast 175
southwest 162
#men, region, children breakdown:
men_only[['children', 'region']].groupby('region').count()
children
region
northeast 163
northwest 161
southeast 189
southwest 163
#so, we can see that the SE has the most numbers of children for both men and women.