Portfolio Project: U.S. Medical Insurance Costs
My name is Luis Moreno. I need a review on my first ever portfolio project on Data Science.
It wasn’t difficult but it took me some days to complete it at satisfaction. An even I could keep extracting insights from it, I achieved what I wanted to learn which is What drivers on data pulls up more the insurance prince.
Thanks in advance for the reviews.
Could you push the notebook to a GitHub repo? The file is on your local machine and people cannot view it as it is now.
OK. Thanks for the recommendation. Done.
Congrats on completing the project! Seems like you have a good understanding of how to use Python to sift through the data.
A few observations:
Avoid using words like “normal” and “old” in any analysis. It shows bias. Neutral language is best. (I’d suggest not using the word “insane” either with regards to insurance costs. “Higher” is a better term). Objective vs. subjective language.
Interesting how the age categories are defined. (Technically, one is an adult at the age of 18.) What is the median age in the data? (39). Perhaps a range of ages is better than a label of “young”, “old”, etc.
When calculating correlation, it might be a good idea to describe, briefly, what the correlation coefficient is and the strength of the relationship between variables (for people who aren’t familiar with it). You already do a good job of describing what your thought processes are as you sift thru the data.
Data analysts are storytellers of data, who uncover–without bias–what’s in the data. Know your audience too. I’m not saying over-explain everything, but, don’t assume anyone who’s reading your analyses knows everything about EDA or stats.
Also, extra reading bonus points: research BMI and how many in the medical community don’t see it as an accurate indicator of one’s overall heath. It’s a made up number that insurance companies use to charge people higher rates.