I was able to break down and import the data. I was able to do basic analysis with one column, but I’m struggling to figure out how to do analysis between columns.
I may not even be doing this project correctly… but the first three sections all work!
Here is where I start screwing up:
I’m trying to compare the average cost difference between smokers and non-smokers.
I’m just getting the number of items in the other column “charges”, but not the appropriate values for charges.
Thanks for your help!
I wondered…since you’re using Pandas, why don’t you use their built-in functions to analyze the data, rather than writing out functions? I mean, unless you prefer writing out functions instead(?)
Ex (I named the df “insurance” when I analyzed the data):
insurance["region"].value_counts()
southeast 364
southwest 325
northwest 325
northeast 324
and, to get basic stats overall:
insurance.describe()
Or, to analyze a column use insurance['age'].mean()
(39.20702541106129)
To compare, use groupby:
insurance[["age", "sex", 'charges']].groupby('sex').mean()
|
age |
charges |
sex |
|
|
female |
39.503021 |
12569.578844 |
male |
38.917160 |
13956.751178 |
To compare smokers vs. non_smokers you can use .iloc()
to locate the rows of each and then have two smaller dfs. For example:
smokers = insurance.iloc[(insurance['smoker']=='yes').values]
smokers.head()
and then use .describe()
to verify and get basic stats OR, you could also use value_counts()
on columns to see how many male/female smokers there are, etc.
Then do the same steps for non_smokers.
You can also plot out the datasets to get a better idea, visually rather than just looking at tables from the calculations.
One example:
And,
See:
https://pandas.pydata.org/pandas-docs/stable/user_guide/index.html
1 Like
Thanks, I haven’t actually done the lessons on Pandas yet… that’s why I was writing out all the functions.
I just googled around and worked the first part out to get a table.
Thanks, again!
1 Like
Ah, ok. I misunderstood. I thought the project was following Pandas.
Also, when importing the library, it’s import pandas as pd