The project itself wasn’t hard to “assemble”, but it surely was something else when I realized I had to come up with my own requirements.
I already has some experience with Numpy and Pandas from a previous freeCodeCamp course, and decided why not use it, it said it can be used 
It took about 2 hours because of the brainstorming and because I didn’t have any practical experiences with the libraries, but it was definietly fun
git repo ascii file: codecademy_portfolio_projects/us-medical-insurance-costs.asciidoc at main · Tofan-afk/codecademy_portfolio_projects · GitHub
Thank you for taking your time
A few considerations:
-
Add a brief intro at the top of the notebook including the citation for the data. Pretend you’re presenting the project to someone who knows nothing about the data set. You’re telling a story, so you’d want to have an intro, then analyze the data w/ some comments, and then a conclusion and possible next steps.
-
It might be better to look at the median, rather than the mean of charges in the data b/c there are outliers that pull the mean.
You could see some basic descriptive stats by using methods from the Pandas library. Stuff like: the .describe()
method on the df, or, use .value_counts()
on a specific column too. Don’t forget about .groupby()
as well. Just a suggestion.
Ex:
df.describe()
> age bmi children charges
count 1338.000000 1338.000000 1338.000000 1338.000000
mean 39.207025 30.663397 1.094918 13270.422265
std 14.049960 6.098187 1.205493 12110.011237
min 18.000000 15.960000 0.000000 1121.873900
25% 27.000000 26.296250 0.000000 4740.287150
50% 39.000000 30.400000 1.000000 9382.033000
75% 51.000000 34.693750 2.000000 16639.912515
max 64.000000 53.130000 5.000000 63770.428010
#or
df['charges'].describe().round(2)
>>count 1338.00
mean 13270.42
std 12110.01
min 1121.87
25% 4740.29
50% 9382.03
75% 16639.91
max 63770.43
#or:
df['charges'].mean().round(2)
>13270.42
df['charges'].median().round(2)
> 9382.03
df["smoker"].value_counts()
no 1064
yes 274
#etc
- Are you trying to show some sort of correlation with the functions for bmi & charges and children & charges? There are no hypothesis tests; nor significance testing here.
Good start! Keep at it. 
That’s actually really cool. Thank you for the suggestions it’s been long since I had the opportunity to actually apply my Pandas skills
1 Like