Hi! I just finished this project as a part of the Data and Programming Foundations for AI Skill Path. This took me about 3 hours to complete. Since I am not familiar with any Machine Learning algorithms yet, I applied my knowledge of Python Fundamentals taught in the skill path and attempted some data analysis tasks.
I would love to update this project with things like predictive modelling and data visualization once I learn these concepts in other Codecademy lessons.
Looks awesome!
I like the fact that you worked with the key;value pairs to store the data per age group, I didn’t really use this in mine.
A minor thing that I would change would be in the next part:
print(average_cost_by_smoker_status)
{‘smoker’: 32050.232, ‘non smoker’: 8434.268}
Here I would have printed a $ sign and a rounded value to two decimals.
The rest of the code looks nice, and although we follow differnt paths, most -if not all- the stuff that I learned in the data science path is included in your project.
Rather than look at the average costs, perhaps look at the median instead. There are outliers in the data that pull the mean. (you can see that if you do some descriptive stats on that column-min, max, mean, median)
ex:
#I'm using Pandas here
df['charges'].describe().round(2)
count 1338.00
mean 13270.42
std 12110.01
min 1121.87
25% 4740.29
50% 9382.03
75% 16639.91
max 63770.43
df[["sex", "charges"]].groupby("sex").median().round(2)
charges
sex
female 9412.96
male 9369.62
#by region:
df.groupby(['region', 'sex'])['charges'].median().round(2)
region sex
northeast female 10197.77
male 9957.72
northwest female 9614.07
male 8413.46
southeast female 8582.30
male 9504.31
southwest female 8530.84
male 9391.35
#smoker v. non
df.groupby(['region', 'sex', 'smoker'])['charges'].median().round(2)
region sex smoker
northeast female no 8681.14
yes 22331.57
male no 8334.46
yes 33993.37
northwest female no 7731.86
yes 28950.47
male no 6687.44
yes 26109.33
southeast female no 7046.72
yes 35017.72
male no 6395.95
yes 38282.75
southwest female no 7348.14
yes 34166.27
male no 7318.96
yes 35585.58
#etc
Also, FWIW, I wouldn’t add a $, b/c you’d have to strip it if you were to do any calculations.