First Project U.S.-Medical-Insurance-Costs

Hi all, my name is Wira from Malaysia. Currently i am taking Data Scientist: Machine Learning career path course. And i am about 27% completed. Can you guys review my U.S. Medical Insurance Costs Code? Here is the link: GitHub - WiraAFauzi/U.S.-Medical-Insurance-Costs

Thank you

A project like this might need a bit more explanation and visuals. By that, I mean, the notebook should be like a story–intro, middle, conclusions. So that anyone looking at it could follow along with your analysis. This is an EDA & descriptive project, so it might be a good idea to revisit the concepts in the Principles of Data Literacy section, specifically, Analyzing Data and Thinking about Data. Later on you can apply the inferential and possibly causal analysis sections to this dataset as well.

Suggestions:

  • Brief intro of the data set and the questions you are going to explore,

  • It’d be great to see the output of the functions…

  • I see you put the output of the functions in the readme file. They would be better placed in the notebook itself, either in the cell below the function or in the concluding portion.

  • it might be better to also look at the median of the charges, rather than just the mean. There are outliers in the data that pull the mean.
    ex:

df['charges'].describe().round(2)

count     1338.00
mean     13270.42
std      12110.01
min       1121.87
25%       4740.29
50%       9382.03
75%      16639.91
max      63770.43

df["smoker"].value_counts()
no     1064
yes     274

df[["smoker", 'charges']].groupby('smoker').mean().round(2)
      charges
smoker	
no	8434.27
yes	32050.23

#vs:

df[["smoker", 'charges']].groupby('smoker').median().round(2)

     charges
smoker	
no	7345.41
yes	34456.35

df["region"].value_counts()
southeast    364
southwest    325
northwest    325
northeast    324

#further:

df(['region', 'sex', 'smoker'])['charges'].median().round(2)

region     sex     smoker
northeast  female  no         8681.14
                   yes       22331.57
           male    no         8334.46
                   yes       33993.37
northwest  female  no         7731.86
                   yes       28950.47
           male    no         6687.44
                   yes       26109.33
southeast  female  no         7046.72
                   yes       35017.72
           male    no         6395.95
                   yes       38282.75
southwest  female  no         7348.14
                   yes       34166.27
           male    no         7318.96
                   yes       35585.58


#stuff like that