Hello, thanks for taking your time out to review my project.
Congrats on completing the project.
-
the notebook is well plotted out/easy to follow–goals listed at the top, use of comments is helpful too as you sort through the data.
-
Just curious: what is the reasoning behind including the total of all the charges? Is it to show the ridiculous amounts that people are charged here in the U.S.? (kind of joking, but not. Healthcare costs are outrageous).
-
I think your function for finding the number of people in each region needs to be tweaked. The region with the most people is the Southeast.
See:
insurance["region"].value_counts()
>>southeast 364
southwest 325
northwest 325
northeast 324
Name: region, dtype: int64
- I think the function for calculating the median charges for each region is off by a bit. (maybe b/c you used
math.floor()
?
See:
insurance[['region', 'charges']].groupby('region').median().round(2)
>> charges
region
northeast 10057.65
northwest 8965.80
southeast 9294.13
southwest 8798.59
- Would it be better to find the median charges for smokers vs. non-smokers rather that the total? Median gives us a better comparison, no? Just something to think about.
insurance[['smoker', 'charges']].groupby('smoker').median().round(2)
>> charges
smoker
no 7345.41
yes 34456.35
- You might consider adding a conclusions section at the end of the notebook–just to wrap up your findings by bullet points or something.
Good work.