U.S. Medical Insurance Costs portfolio project - smoking is correlated with high medical charges

Hello, Codecademy community! Nice to “meet you.”

This project felt a lot more manageable after the extended period of time learning how to do the Hurricanes project. Both were quite interesting. It took about a day to do this project and I considered the appropriate difficulty for practicing Python skills. My primary takeaways from working with this dataset is the following:

  • The average patient age is 39.
  • Representation between genders and the four regions were pretty balanced.
  • Charges for women were about 10% less than those for men.
  • Smokers faced charges about 3.5 times that of non-smokers.
  • The ratio of parents who smoked decreased with as the number of children increased.
  • The southeast region had the highest average charge, the highest average BMI and the highest number of smokers
  • I thought there would be more variance between some of the information, but it was actually kind of boring and similar across several of the comparisons

Here is the link to my portfolio project:

Things I would like to improve about my code:

  • Print only a small sample of each list instead of the entire list of 1338 patients as I verify the code does what I want
  • Adapt functions so that I can apply them to all of the lists that need to be converted to integers or floats
  • Learn how to implement classes and class methods in this context
  • Learn if there is a way to simplify the code and make it cleaner.

While I compared a lot of the data, there is more that could be evaluated and I’m looking forward to seeing what others have discovered and how.

First of all you did a great job with combing through the data and organizing variables and lists with proper names.
As a reader of you analysis it was quite hard to find your conclusions due to the massive print outputs in the first part.
Your comparisons of different categories make sense, if you want to come to more and better conclusion try to use the median and interquartile range and try to view histogramms then you will see patterns and other interesting features emerge.

Agreed… the print outputs were far too long.
Thank you for the suggestion of using median and interquartile ranges to examine the data.

Hello there
Great project you have there, I really enjoyed going through it, the functions you have written to test the data were very clean and easy to understand in my opinion, you have included a very brief #explanation above every function, and the conclusions were always written in a very straight forward way

Which made understanding the function even easier.
Good work dividing the data into different lists, and you created a dictionary of every patient in a very neat way, you first found out the number of all the patients and then used the index of each patient as the value for the key ‘Patient’

I found this to be very smart.
Really, this was a great project, aside from the long scrolling, but you already wrote that in the future you will be using samples instead of printing the whole thing, which is great. keep it up