U.S Medical Insurance Costs ( Data Science Foundations )

Hi, i just finished the U.S Medical Insurance Costs. Let me know whether my coding is efficient and readible? Here’s the link
https://github.com/skuyliving1/U.S-Medical-Insurance/blob/main/U.S%20Medical%20Insurance%20Cost%20(Last%20Data%20Science%20Foundations).ipynb

  • If there’s a way you could limit the output, rather than have a wall of dictionary text, that might be something to look into. Is it possible to limit that to the first 5 rows of records instead? It just looks a bit cleaner & the viewer doesn’t have to keep scrolling.

  • You might want to look at the median rather than the average for the charges column. There are outliers that pull the mean.
    See:

df['charges'].mean().round(2)
13270.42

df['charges'].median().round(2)
9382.03

#or:
df['charges'].describe().round(2)
count     1338.00
mean     13270.42
std      12110.01
min       1121.87
25%       4740.29
50%       9382.03
75%      16639.91
max      63770.43
  • In the smokers’ costs and non-smokers’ costs section, is that the sum total of all the costs? Would it be better to look at the difference in the means instead? :thinking:
df[["smoker", 'charges']].groupby('smoker').mean().round(2)
       charges
smoker	
no	   8434.27
yes	  32050.23
  • In the Conclusions section, it’s not a good idea to make medical recommendations unless, maybe if you’re a doctor. It’s true that smokers do tend to pay more for insurance b/c of the negative health risks from smoking. (Which is also why they ask if you smoke when you’re signing up for insurance.) But here, there might be other factors that influence the charges. Maybe region affects costs or whether the person is male or female, etc? Right now, we just see a possible correlation with smokers and charges. You’d have to do some significance testing to see the effects on charges.
1 Like

hello lisaa, thanks for your honest review… i’m glad you are here to take a look of my project🙌🏻 i’ll try correct my code🫶🏻

1 Like