Data, data, data! My first portfolio project. Wdyt?

Please see below my submission for the Medical Insurance Costs project in the Data Science/ML/AI Skills Path course. Let me know what you think about my scope, and how the data was analysed. The defined goal was probably a bit lame/obvious, so would be amazing to read any other more interesting ideas.

Can’t wait for the next course chapters, where I’m hoping to learn how to make predictions on datasets like this!

Thanks for your time.

Well, it’s your first project here and I don’t think the defined goals are lame (but I get where you’re coming from). :slight_smile:
I’m glad that people still submit projects here. I just wish more people would review them. Feedback is useful. That said, the following are just my thoughts & things to possibly consider.

Thoughts/suggestions:

  • U.S. insurance companies’ goal is to make money; they don’t care if people are healthy or not. So, I would nix the motivational messaging part about adopting a healthier lifestyle (which is subjective) in the readme file. I would also disregard BMI b/c analysis of it tends to be loaded with bias and subjective language. A bit of research would reveal that it’s not an accurate measure of one’s overall health (it doesn’t take into account muscle mass, family history, bone density, race or sex differences, etc.) . It’s a number that insurance companies use to charge people higher premiums. It really depends on where one lives & the amount of insurance co. competition in that city, county, state, region. For more, see this article which refers to the ACA for example: Where You Live Determines How Much You Pay For Health Insurance - KFF Health News

  • It’s good you included a readme file. Most people forget this part.

  • You clearly have a solid understanding of how to use of the csv library and writing functions.

  • Before you jump into averages/means, perhaps take a look at the spread of the data–min, max, and median. There are some outliers in the data which pull the mean.
    Ex:

df['charges'].describe().round(2)
charges
count	1338.00
mean	13270.42
std	12110.01
min	1121.87
25%	4740.29
50%	9382.03
75%	16639.91
max	63770.43

  • I like the granularity–that you broke down the total number of records in each age group and no. of children & how much they’ve paid.

  • Be mindful of using subjective terms like, “normal”, “overweight”, “obese”, “low”. I would not look at the BMI variable b/c it’s a made up number that isn’t an indicator of one’s health & analysis of it tends to lead to value judgments rather than objectivity.

  • your analysis just kind of drops off at the end. Maybe restate your conclusions and possible next steps/ideas.

Good work! :woman_technologist:

1 Like

Thank you so much for the detailed feedback, super helpful! :smile:

1 Like