US Medical Insurance Portfolio Project for Review

[Hello Everyone!

This is the first portfolio project I have completed on Codecademy in my Data Scientist Career Path. As someone, who is completely new to this domain, I would appreciate your feedback on my analysis and ways I can improve. It would be great if you could point out any errors or recommend any additional steps that could help me build a robust analysis.

Thanks,
Palash](GitHub - palash-maske95/Portfolio-Projects)

Congrats on completing the project.

  • It’s easy to follow–the use of comments lets the reader know what your intentions are w/analysis and the outcomes of the functions.

  • You have a clear understanding of functions and extracting information from the dataset.

  • You might want to look at the median of charges rather than the mean. If you do a data viz, you can see that there are some outliers which would pull the mean too.

insurance['charges'].mean().round(2)

>>13270.42

insurance['charges'].median().round(2)
>> 9382.03

Or,
insurance['charges'].describe()
>>count     1338.000000
mean     13270.422265
std      12110.011237
min       1121.873900
25%       4740.287150
50%       9382.033000
75%      16639.912515
max      63770.428010
  • Looking at the median charges, it’s actually the Northeast that has the highest median at $10,057:
insurance[['region', 'charges']].groupby('region').median().round(2)

>>	        charges
region	
northeast	10057.65
northwest	8965.80
southeast	9294.13
southwest	8798.59
  • (In your conclusions) Be careful of making statements like: ’ …the average insurance cost is higher in the southeast region. The main lever for this can be attributed to a slightly higher number of smokers as well as slightly higher bmi in the region."
    While true, the number of smokers is higher in the Southeast and the charges are leaning towards the higher end, you can’t really claim there is a relationship there. Charges could be higher due to other variables–environmental, economic, or maybe insurance companies charge higher premiums in areas with a lower median income, etc. And, bmi isn’t an accurate health indicator. It’s a made up number that insurance companies use to charge higher rates. You’d have to do some hypothesis testing to see if there are significant correlations here.

Good work. Keep at it.

1 Like

Thank you so much for reviewing my project.

Your feedback and recommendations are surely valid and something for me to keep in mind for my next project. I will surely work and improve on these.

Appreciate your time and effort to put your point across. :slight_smile:

1 Like