U.S. Medical Insurance Costs Portfolio Project! Feel free to give some feedback

Hi everyone,

here is the link to my version of the Portfolio project about U.S. Medical Insurance Costs.

I decided to handle it from a real “Data scientist point of view”, trying to reach some useful conclusions from the analysis, and not just like a “way to hone Python skills”.

It took me a couple of days…and it was fun, once you define how to approach the data (i.e.: what you want to search in a reasonable and methodical way, which I find to be the hardest part of the project)

Let me know what you think, especially concerning Python coding, in order to understand how I can optimize it!

Happy coding to everyone

1 Like

Good work. Seems like you understand how to navigate around in a data set, write functions, etc.

I have my own issues with this project/data set, mostly how people make lots of assumptions (not you, here, but I’ve seen some really fat-phobic commentary and sweeping generalizations with this particular project).

In regards to BMI, be mindful of making presumptive statements like, “We can conclude that the best way to lower health insurance costs is to quit smoking and keep in shape, lowering the bmi to the normal range as much as possible.

It would be useful to do some research on how BMI is a controversial number and not an accurate measure of one’s overall health, rather, insurance companies use it to calculate premiums.

I’d also look at larger factors like the US accounts for 40% of the total global health spending.

20% of the total US GDP is spent on health care(!)…and health care costs always outpace inflation.

It’s not that we as a whole are using health care more, but that it costs more. Just some things to keep in mind when looking at data sets like this.

Hi! Thank you for your reply, that’s very insightful.

What you point out about BMI is absolutely true, I am the first one skeptical about it (how come a bodybuilder can be considered obese? :thinking:)…
Furthermore, I have no real perception on how complex the healthcare situation is in the U.S., nor I expect to learn it from a Codecademy exercise.

For sure I know that my conclusions were super naive…it was just a way to make some final considerations regarding the output of the analysis I made, in order to keep the “Data scientist-like” style of the project until the end.
The conclusions I drew have no presumption of being “real”…they’re there just for “exercise reasons”

If I would have drawn serious conclusions about smoking and bmi contribution on health insurance costs, I would have at least made the opposite analysis as well, i.e: prove that lower insurance costs are associated with no-smoking and low-bmi conditions…and probably other more analyses would have been needed (not taking in consideration all the general factors concerning the health care costs in the US you were mentioning)…but to be honest, it would have taken too much time for a Python project on using lists and dictionaries…so I kept it simple with “dummy but plausible” conclusions.

Probably I should have stated in the topic that my project is from a “Data scientist-like point of view”…
but the fact that it brought to a real discussion about Health insurance sounds very cool to me…it means that the analysis seems a real “data scientist-like” analysis and not just a “Phyton exercise”

Anyway I really thank you for your observation…in the real world of data analysis taking into consideration every aspect of the problem is crucial, and what you stated in your message is absolutely correct!
I’ll try to be more realistic in future projects :wink:


1 Like

Yep, I’m sure this data set is an introductory eye opener to anyone not based in the U.S. or to anyone who is used to national health care.

I always think that data doesn’t exist in a vacuum. There are other potential variables that might affect what we’re seeing too.

Happy coding!

1 Like

Here is the link to my project.

I would be more than happy to hear your reviews. I am open to recommendations and thoughts on the findings.
I will add more explaining texts let me say beforehand.



Unfortunately I can’t access your U.S. Medical Insurance project…when I open the link the github webpage says “404 this is not the webpage you are looking for”…


Oh hi!
Sorry, it was not public by mistake, but it is now!
Thank you, Alberto!


Coming from a medical background and now studying coding for the first time, I really enjoyed going through your work.

For me coding should help and not complicate things and this is what I found in your work.

1 Like

Hey @obahube!

Thank you for sharing your project! Even though you didn’t want to focus on Python per se, your creativity in solving some challenges in “simple” (i.e. non-pandas) Python is impressive. I also really liked your idea to sort variables into buckets and how you used thresholds for analysis. I would like to offer a few recommendations.

Asking for Specific Feedback

According to the codecademy article Giving and receiving code review you should always ask

for feedback on a specific question, usually in regard to code quality, readability, correctness, or security.

However, I only mention this for completeness’ sake and because it is always helpful for the review ever to have a specific question to answer. Even though codecademy states this in the article above, the general Projects section here in the forum seems to be a more general feedback channel.

Critically Scrutinizing the Source Material/Dataset

The author of the dataset does not adequately provide a source for the data. Therefore, it is impossible to verify the integrity and accuracy of the contained data, nor is it possible to scrutinize the quality of data collection. Even though this is “just” an exercise, the source of the data should always be scrutinized and discussed. In my opinion this is part of the exercise, just like analysing the data is.

Some Notes on Python

Since you asked what we think especially concerning Python, I wrote down some notes for you.

A Note on Variable Naming

In my opinion you could benefit from creating more descriptive, and more precise variable names. For example, a popular way to iterate over a list in Python is:

for item in list:
    # Do stuff

However, if you (again, for example) name your list of sexes “sex”, this is not really possible.

for sex in sex:
    # Do stuff

… doesn’t really work and is very confusing. A simple fix would be to rename “sex” to “sex_list”.

A Note on Python f-Strings

Since Python 3.6 Python supports f-strings for string formatting. f-strings are more powerful and more elegant than string concatenation. One could do:

male_count = 8982
print(f'The male count is: {male_count}. The male count for times is {male_count * 4}.')

Which would print:

The male count is: 8982. The male count for times is 35928.

If this interests you, I can recommend this article from realpython.com.

Some Tiny Details Because I Just Can’t Help Myself

Thank you for an interesting analysis!