My completed work. Please reply with your feedback and with your project so I can give you feedback also!

HasMotam/medical_insurance_portfolio_project: The U.S. Medical Insurance Portfolio Project that is part of the ML/AI Engineering Foundations and Data Science skill paths on codecademy. (github.com)

Congrats on completing the project.

One question though, concerning the conclusion of the number of smokers broken out by female and male, that there are more women who smoke…

There are 274 total smokers in the data set.

insurance['smoker'].value_counts()

>>no     1064
yes     274

#and the numbers of men & women:

insurance['sex'].value_counts()

>>male      676
female    662

#if I isolate the smokers and put them in their own df:

smokers = insurance.iloc[(insurance['smoker']=='yes').values]

smokers.head()

>>   age	sex bmi   children smoker	region	charges
0	19	female	27.90	0	yes	    southwest	16884.9240
11	62	female	26.29	0	yes	   southeast	27808.7251
14	27	male	42.13	0	yes	   southeast	39611.7577
19	30	male	35.30	0	yes	   southwest	36837.4670
23	34	female	31.92	1	yes	   northeast	37701.8768

and then,

smokers['sex'].value_counts()

>>male      159
female    115

How is it that this, ‘The statement that, proportionally, more women smoke than men is: True’
be true? Or, am I missing something?

159/676 = 23.5% (men),
115/662 = 17.3% (women)

Hi,

You’re absolutely correct. My for loop didn’t add to the count so I was always adding the number of smoking women. I’ve changed the code now and the github repo should reflect that. Thanks for that!

So it turns out that, proportionally, there are more smoking men than smoking women.

Best Regards,
Hasan

1 Like