Congrats on completing the project.
One question though, concerning the conclusion of the number of smokers broken out by female and male, that there are more women who smoke…
There are 274 total smokers in the data set.
insurance['smoker'].value_counts() >>no 1064 yes 274 #and the numbers of men & women: insurance['sex'].value_counts() >>male 676 female 662 #if I isolate the smokers and put them in their own df: smokers = insurance.iloc[(insurance['smoker']=='yes').values] smokers.head() >> age sex bmi children smoker region charges 0 19 female 27.90 0 yes southwest 16884.9240 11 62 female 26.29 0 yes southeast 27808.7251 14 27 male 42.13 0 yes southeast 39611.7577 19 30 male 35.30 0 yes southwest 36837.4670 23 34 female 31.92 1 yes northeast 37701.8768
smokers['sex'].value_counts() >>male 159 female 115
How is it that this, ‘The statement that, proportionally, more women smoke than men is: True’
be true? Or, am I missing something?
159/676 = 23.5% (men),
115/662 = 17.3% (women)
You’re absolutely correct. My for loop didn’t add to the count so I was always adding the number of smoking women. I’ve changed the code now and the github repo should reflect that. Thanks for that!
So it turns out that, proportionally, there are more smoking men than smoking women.