Please take a look at my Portfolio Project: US Medical Insurance Costs

Here is the link so you can check my code out~

Please be critical and don’t hold back~

Thanks in advance :smiley:

Some considerations/thoughts:

  • Good use of comments, intro, conclusions. One could follow along pretty easily. The code checks out too.

  • Glad you used median for the charges column. Because there are outliers, the mean is pulled.
    ex:

df['charges'].describe().round(2)

count     1338.00
mean     13270.42
std      12110.01
min       1121.87
25%       4740.29
50%       9382.03
75%      16639.91
max      63770.43
  • I like the granularity in several sections. Including the breakdown of age groups and smokers.

  • I don’t really understand the section titled, “Ultimate checklist for insurance”. Are these people unique or something?

  • In the Findings section: You can’t really solidly state that there is a correlation between smokers and higher charges. I mean, it definitely appears this way, but to make that claim some statistical tests should also be run. If you graph it, sure, there’s an positive, upward trend, but to fully back it up we need a test of significance. (binary categorical variable-smoker v. non, & 1 quantitative variable-charges. So, a two tailed t-test would help). I guess you could re-word it and say that it appears there’s a correlation between smokers and higher charges. (That also might be due to health issues connected to smoking: heart disease, higher blood pressure, cancer, pulmonary issues, etc).

  • Also in the findings section, be mindful of using words like “assume”–> “We can also assume that people that smoke are less likely to sign up for insurance.” You can’t assume that.

I mean sure, in general when ppl sign up for insurance (in the US), one of the questions they ask is if you are a smoker…and those premiums tend to be higher than that of a non-smoker.

Good work!

1 Like

The ultimate checklist is just cherry picking certain groups
(ex. a 25 year old man that smokes and has 3 children)
and looking at the results.

I found it silly and decided to include it.

Later on added the opposite section so MAYBE something pops up.

Most of the time you will end up with NOTHING. Because the input/ database is the way it is.

I agree with your statements that I can’t with absolute certainty declare those things.

Again looking at THIS database and comparing numbers it is a constant.

And that in GENERAL with my INTERNAL BIAS, smokers will probably avoid insurance because it is higher.

Thanks for your comment really appreciate it :D!

I will reword the findings section~

1 Like