Group Project: U.S. Medical Insurance Costs

Hi Guys, I´m in Machine learning course with 27% and this is my project

U.S. Medical Insurance Costs

Git: GitHub - WilliamGaleanoM/python-portfolio-project-starter-files: Code to Codecademy project
I would like you comment about this

This is module 6 without Pandas library :pray:

Congrats on completing the project.

A few thoughts:

  • you have a solid grasp on how to write functions to glean insights from the data set.

  • you clearly describe your goals at the top of the notebook.

  • don’t forget to cite where the dataset came from in the readme file or at the top of the notebook.

  • maybe rather than compute the mean cost of insurance, it might be better to find the median. There are outliers in the data set and that will skew the mean.

Ex:

df.describe()
>>             age	  bmi	    children	 charges
count	1338.000000	1338.000000	1338.000000	1338.000000
mean	39.207025	30.663397	1.094918	13270.422265
std	14.049960	6.098187	1.205493	12110.011237
min	18.000000	15.960000	0.000000	1121.873900
25%	27.000000	26.296250	0.000000	4740.287150
50%	39.000000	30.400000	1.000000	9382.033000
75%	51.000000	34.693750	2.000000	16639.912515
max	64.000000	53.130000	5.000000	63770.428010

#or,

df['charges1'].mean().round(2)
>>13270.42

df['charges'].median().round(2)
>> 9382.03
  • that said, it might be good to show the mean and median of the charges col. in the notebook.

  • The difference in costs between smoker vs. non smoker is a bit less ($27110.94):

insurance[['smoker', 'charges']].groupby('smoker').median().round(2)
>>           charges
smoker	
no	    7345.41
yes	   34456.35

Short and to the point. Good work. :woman_technologist:

1 Like