U.S.-Medical-Insurance-Costs Review. Much Appreciated!

Hello, thanks for taking your time out to review my project.

Congrats on completing the project.

  • the notebook is well plotted out/easy to follow–goals listed at the top, use of comments is helpful too as you sort through the data.

  • Just curious: what is the reasoning behind including the total of all the charges? Is it to show the ridiculous amounts that people are charged here in the U.S.? (kind of joking, but not. Healthcare costs are outrageous).

  • I think your function for finding the number of people in each region needs to be tweaked. The region with the most people is the Southeast.
    See:

insurance["region"].value_counts()

>>southeast    364
southwest    325
northwest    325
northeast    324
Name: region, dtype: int64
  • I think the function for calculating the median charges for each region is off by a bit. (maybe b/c you used math.floor()?
    See:
insurance[['region', 'charges']].groupby('region').median().round(2)

>>	       charges
region	
northeast	10057.65
northwest	8965.80
southeast	9294.13
southwest	8798.59
  • Would it be better to find the median charges for smokers vs. non-smokers rather that the total? Median gives us a better comparison, no? Just something to think about.
insurance[['smoker', 'charges']].groupby('smoker').median().round(2)

>>     charges
smoker	
no	   7345.41
yes	 34456.35
  • You might consider adding a conclusions section at the end of the notebook–just to wrap up your findings by bullet points or something.

Good work.