US Medical Project - Python Fundamentals

Hello Everyone!

I have finished the US Medical Project and am looking for feedback. Here is the link to my gist: US Medical Insurance Project - Python Fundamentals · GitHub

The project took me about 4 hours.

Here is the feedback I am looking for:

  • Python Coding Skills - Is my code legible? Are my variables’ names good? Any other comment on this would be nice.

  • Data Analysis - Did the project goals make sense? How was the analysis of the data?

My biggest issue with the project was thinking like “Data Scientist/Analyst.” Scoping the project is what was difficult for me.

I kept my analysis of the project simple because I did not want to go down the rabbit hole of trying to overdo it.

Please let me know yall’s thoughts!

Thanks in advance.

Congrats on completing the project! This is a data set that you’ll return to as you go on in the course and learn more DA skills (or, you could do your own separate analysis too, to try out new coding skills).

One thing…
For the 3rd calculation about who has more children, it might also help if you added some more info. Perhaps the total number of men & women (which is 676 and 662, respectively) as well as the total number of kids in the dataset.

Also, I think there might be a miscalculation b/c I came up with different numbers.

#this is using pandas

insurance["sex"].value_counts()

male      676
female    662


#the numbers of kids/rows, further:

insurance["children"].value_counts()

0    574
1    324
2    240
3    157
4     25
5     18
#764 kids total

If I break out the insurance data into separate data frames, by women & men:

women_only = insurance.iloc[(insurance['sex']=='female').values]

women_only['children'].value_counts()
0    289
1    158
2    119
3     77
4     11
5      8
#women who have children: 373

#Men:
men_only = insurance.iloc[(insurance['sex']=='male').values]

men_only["children"].value_counts()
0    285
1    166
2    121
3     80
4     14
5     10

#Men who have children: 391

More (b/c I was curious at the numbers of children per region that women vs. men had):

#women, children, region breakdown:

women_only[['children', 'region']].groupby('region').count()
           children
region	
northeast	161
northwest	164
southeast	175
southwest	162

#men, region, children breakdown:
men_only[['children', 'region']].groupby('region').count()

         children
region	
northeast	163
northwest	161
southeast	189
southwest	163

#so, we can see that the SE has the most numbers of children for both men and women.