Congrats on completing the project.
One question though, concerning the conclusion of the number of smokers broken out by female and male, that there are more women who smoke…
There are 274 total smokers in the data set.
insurance['smoker'].value_counts()
>>no 1064
yes 274
#and the numbers of men & women:
insurance['sex'].value_counts()
>>male 676
female 662
#if I isolate the smokers and put them in their own df:
smokers = insurance.iloc[(insurance['smoker']=='yes').values]
smokers.head()
>> age sex bmi children smoker region charges
0 19 female 27.90 0 yes southwest 16884.9240
11 62 female 26.29 0 yes southeast 27808.7251
14 27 male 42.13 0 yes southeast 39611.7577
19 30 male 35.30 0 yes southwest 36837.4670
23 34 female 31.92 1 yes northeast 37701.8768
and then,
smokers['sex'].value_counts()
>>male 159
female 115
How is it that this, ‘The statement that, proportionally, more women smoke than men is: True’
be true? Or, am I missing something?
159/676 = 23.5% (men),
115/662 = 17.3% (women)
Hi,
You’re absolutely correct. My for loop didn’t add to the count so I was always adding the number of smoking women. I’ve changed the code now and the github repo should reflect that. Thanks for that!
So it turns out that, proportionally, there are more smoking men than smoking women.
Best Regards,
Hasan
1 Like