Hi, i just finished the U.S Medical Insurance Costs. Let me know whether my coding is efficient and readible? Here’s the link
https://github.com/skuyliving1/U.S-Medical-Insurance/blob/main/U.S%20Medical%20Insurance%20Cost%20(Last%20Data%20Science%20Foundations).ipynb
-
If there’s a way you could limit the output, rather than have a wall of dictionary text, that might be something to look into. Is it possible to limit that to the first 5 rows of records instead? It just looks a bit cleaner & the viewer doesn’t have to keep scrolling.
-
You might want to look at the median rather than the average for the charges column. There are outliers that pull the mean.
See:
df['charges'].mean().round(2)
13270.42
df['charges'].median().round(2)
9382.03
#or:
df['charges'].describe().round(2)
count 1338.00
mean 13270.42
std 12110.01
min 1121.87
25% 4740.29
50% 9382.03
75% 16639.91
max 63770.43
- In the smokers’ costs and non-smokers’ costs section, is that the sum total of all the costs? Would it be better to look at the difference in the means instead?
df[["smoker", 'charges']].groupby('smoker').mean().round(2)
charges
smoker
no 8434.27
yes 32050.23
- In the Conclusions section, it’s not a good idea to make medical recommendations unless, maybe if you’re a doctor. It’s true that smokers do tend to pay more for insurance b/c of the negative health risks from smoking. (Which is also why they ask if you smoke when you’re signing up for insurance.) But here, there might be other factors that influence the charges. Maybe region affects costs or whether the person is male or female, etc? Right now, we just see a possible correlation with smokers and charges. You’d have to do some significance testing to see the effects on charges.
1 Like
hello lisaa, thanks for your honest review… i’m glad you are here to take a look of my project🙌🏻 i’ll try correct my code🫶🏻
1 Like