US Medical Insurance Data: Connection between BMI and Smoking?

Any feed back would be great, i realize its quite verbose in terms of the amount of code i used, and will one day proof read any spelling errors etc. Am also looking into breaking it down in Class’s so as to lessen the amount of code required. Although im quite happy with the results, which is more about the process of solidifying some of the skills i have learned from codecacademy so far.

Thanken ye all…

A thought…Rather than write out the code to get the stats for each region (mean, std, median)…you could first break out the regions using .iloc[()].values like so:

southwest = df.iloc[(df['region']=='southwest').values]

And then just use the .describe() method on the bmi column:


which results in:

count    325.000000
mean      30.596615
std        5.691836
min       17.400000
25%       26.900000
50%       30.300000
75%       34.600000
max       47.600000
Name: bmi, dtype: float64

See the documentation here.

1 Like

imo you don’t need to create classes. Pandas is quite powerful in its own right (in addition to using scipy.stats & math libraries). If you can write a function you can run two-tailed t-test for statistical significance and figure out the strength of the relationship between the variables using Cohen’s d too.

1 Like