I was able to break down and import the data. I was able to do basic analysis with one column, but I’m struggling to figure out how to do analysis between columns.
I may not even be doing this project correctly… but the first three sections all work!
Here is where I start screwing up:
I’m trying to compare the average cost difference between smokers and non-smokers.
I’m just getting the number of items in the other column “charges”, but not the appropriate values for charges.
Thanks for your help!
I wondered…since you’re using Pandas, why don’t you use their built-in functions to analyze the data, rather than writing out functions? I mean, unless you prefer writing out functions instead(?)
Ex (I named the df “insurance” when I analyzed the data):
and, to get basic stats overall:
Or, to analyze a column use
To compare, use groupby:
insurance[["age", "sex", 'charges']].groupby('sex').mean()
To compare smokers vs. non_smokers you can use
.iloc() to locate the rows of each and then have two smaller dfs. For example:
smokers = insurance.iloc[(insurance['smoker']=='yes').values]
and then use
.describe() to verify and get basic stats OR, you could also use
value_counts() on columns to see how many male/female smokers there are, etc.
Then do the same steps for non_smokers.
You can also plot out the datasets to get a better idea, visually rather than just looking at tables from the calculations.
Thanks, I haven’t actually done the lessons on Pandas yet… that’s why I was writing out all the functions.
I just googled around and worked the first part out to get a table.
Ah, ok. I misunderstood. I thought the project was following Pandas.
Also, when importing the library, it’s
import pandas as pd