Review my project: U.S. Medical Insurance Data

Hello fellow learners!

This was my first portfolio project and I would greatly appreciate any feedback!

You can review it at:
U.S. Medical Insurance Data

The ‘.ipynb’ file uploaded to GitHub contains everything in one place, firstly the breakdown of the analyzed ‘.csv’ file, secondly what I expected to observe by looking at the data and thirdly which conclusions I have drawn from the said data. After that, the code is structured as I wrote it. I added comments to the best of my ability.

Main objective was to analyze the dataset with the knowledge I got from Codecademy Data Analyst Path. It has taken me a few days, doing little by little when I had time. I had minimal programming knowledge before I started this course. That is changing at a slow but steady pace, with which I am pleased.

I am interested to hear your feedback about:

  • are the visuals comprehensive, easy to read and good looking?
  • are the conclusions logical?
  • am I missing something?
  • am I being biased anywhere?

I am fully aware that my code isn’t the most efficient or elegant, but with limited knowledge at this point in time, the only goal was to ‘do the job’!

Can’t wait to hear from you!

Congrats on completing the project. You put a lot of work into it.
Seems like you have a solid understanding of functions and looping through the data to extract pertinent info. Good use of comments in the notebook too. This helps people who aren’t familiar with code or dataset have a better understanding of what you’re sifting through.

There are some interesting comparisons drawn from the data. It’s a project that you will revisit on the DS path. As you learn more, you can then use different libraries (Pandas, SciPy, math, Seaborn, etc) to see if there’s any significant differences between the means of the variables (like costs of women vs men, are there any differences in charges between regions, etc).

These are just my thoughts. Others might have differing opinions.

Some ideas:

  • Add more info to the readme file (including where the dataset came from). Everything that’s at the top of your notebook could perhaps go in either the readme file or a slide presentation (if you’re so inclined), and the conclusions you make should be at the end of the notebook or presentation.

  • The “Expectations” section is kind of mislabeled. In that, it’s more in line with assumptions about the data. (which one can have but also remember that can inject bias in exploring any data set.)

  • Avoid using language like, “There should be…” as well.
    DS & DA’s should strive to be objective when exploring data sets. If you’re doing a presentation in the conclusions you can say something like, “I thought I might find this…but the data said this…” or something like that.

  • Be mindful how you might label ages (in the vizzes) in the data set. Try to avoid using terms like “Young” and “Senior”. It’s better to just give the age range of the population in the sample in the charts. The same goes for BMI categorizations. You can use number ranges rather than loaded adjectives.

  • Something else to consider: Selecting a color palette for a visualization is also important. There’s some interesting literature out there about the science behind it and how people interpret it based on diff. characeristics. Color theory is a a fascinating topic too.

Good work!

Thank you for taking the time to review my project! I plan on revising the project to rewrite my code in a more efficient way in the near future.

Regarding your ideas, I wanted to have everything in one place for simplicity’s sake. I assumed that my future employer would only be interested in the analysis, particularly the visualizations, and not how I did it (with the expectation that I would demonstrate my data analysis skills earlier). However, I completely agree with adding where the dataset came from. Realistically, a .ppt presentation may be the best format (at least based on my current knowledge).

I understand your valid point about the assumptions. While I do have some degree of medical knowledge, my experience is based solely on the European healthcare system, so my assumptions may be wildly inaccurate.

As English is not my primary language, I certainly plan on improving in that area.

Regarding naming conventions, I have considered it, but I used what I thought was best. I assume that while employed, my superiors or my predecessors’ work will define the naming conventions.

Lastly, I will look into your suggestion.

Cheers,
Ivan