Us-medical-insurance-costs (stuck)

I was able to break down and import the data. I was able to do basic analysis with one column, but I’m struggling to figure out how to do analysis between columns.

I may not even be doing this project correctly… but the first three sections all work!

Here is where I start screwing up:

I’m trying to compare the average cost difference between smokers and non-smokers.

I’m just getting the number of items in the other column “charges”, but not the appropriate values for charges.

Thanks for your help!

I wondered…since you’re using Pandas, why don’t you use their built-in functions to analyze the data, rather than writing out functions? I mean, unless you prefer writing out functions instead(?)

Ex (I named the df “insurance” when I analyzed the data):


southeast    364
southwest    325
northwest    325
northeast    324

and, to get basic stats overall:


Or, to analyze a column use insurance['age'].mean()

To compare, use groupby:
insurance[["age", "sex", 'charges']].groupby('sex').mean()

age charges
female 39.503021 12569.578844
male 38.917160 13956.751178

To compare smokers vs. non_smokers you can use .iloc() to locate the rows of each and then have two smaller dfs. For example:

smokers = insurance.iloc[(insurance['smoker']=='yes').values]

and then use .describe() to verify and get basic stats OR, you could also use value_counts() on columns to see how many male/female smokers there are, etc.
Then do the same steps for non_smokers.

You can also plot out the datasets to get a better idea, visually rather than just looking at tables from the calculations.

One example:



1 Like

Thanks, I haven’t actually done the lessons on Pandas yet… that’s why I was writing out all the functions.

I just googled around and worked the first part out to get a table.

Thanks, again!

1 Like

Ah, ok. I misunderstood. I thought the project was following Pandas.

Also, when importing the library, it’s import pandas as pd