Hi all, my name is Wira from Malaysia. Currently i am taking Data Scientist: Machine Learning career path course. And i am about 27% completed. Can you guys review my U.S. Medical Insurance Costs Code? Here is the link: GitHub - WiraAFauzi/U.S.-Medical-Insurance-Costs
Thank you
A project like this might need a bit more explanation and visuals. By that, I mean, the notebook should be like a story–intro, middle, conclusions. So that anyone looking at it could follow along with your analysis. This is an EDA & descriptive project, so it might be a good idea to revisit the concepts in the Principles of Data Literacy section, specifically, Analyzing Data and Thinking about Data. Later on you can apply the inferential and possibly causal analysis sections to this dataset as well.
Suggestions:
-
Brief intro of the data set and the questions you are going to explore,
-
It’d be great to see the output of the functions…
-
I see you put the output of the functions in the readme file. They would be better placed in the notebook itself, either in the cell below the function or in the concluding portion.
-
it might be better to also look at the median of the charges, rather than just the mean. There are outliers in the data that pull the mean.
ex:
df['charges'].describe().round(2)
count 1338.00
mean 13270.42
std 12110.01
min 1121.87
25% 4740.29
50% 9382.03
75% 16639.91
max 63770.43
df["smoker"].value_counts()
no 1064
yes 274
df[["smoker", 'charges']].groupby('smoker').mean().round(2)
charges
smoker
no 8434.27
yes 32050.23
#vs:
df[["smoker", 'charges']].groupby('smoker').median().round(2)
charges
smoker
no 7345.41
yes 34456.35
df["region"].value_counts()
southeast 364
southwest 325
northwest 325
northeast 324
#further:
df(['region', 'sex', 'smoker'])['charges'].median().round(2)
region sex smoker
northeast female no 8681.14
yes 22331.57
male no 8334.46
yes 33993.37
northwest female no 7731.86
yes 28950.47
male no 6687.44
yes 26109.33
southeast female no 7046.72
yes 35017.72
male no 6395.95
yes 38282.75
southwest female no 7348.14
yes 34166.27
male no 7318.96
yes 35585.58
#stuff like that