US Medical Project: how to have a two-way table have an entry that isn't frequency

Hi all,

Currently working on the portfolio project of US Medical Insurance.

I’ve created a two-way table using the Pandas crosstab function:

average_cost_by_sex_and_region = pd.crosstab(index=pd_df["sex"], 
                           columns=pd_df["region"])

average_cost_by_sex_and_region.index = ["male","female"]

average_cost_by_sex_and_region

At the moment, the values in this table are “count” values - the number of females in the Northwest, for example, and what it want it to be are the mean insurance cost for them. I’ve generated those figures here:

pd_df.groupby(["region", "sex"])["charges"].mean() # Average cost split by biological sex in area

But I can’t work out how to get those figures into the two-way table - does that make sense?

I also can’t get the syntax of “margins” to work such that I get row and column totals, although the documentation suggests this should be fairly easy - any suggestions on that?

Thanks in advance

Do you mean something like this? (Also, maybe look at median rather than mean b/c there are outliers that pull the mean).

sum in each region:

df.groupby(['region', 'sex'])['charges'].sum().round(2)

>>region     sex   
northeast  female    2085465.71
           male      2258202.88
northwest  female    2046698.75
           male      1989013.25
southeast  female    2362442.12
           male      3001247.65
southwest  female    1826454.62
           male      2186300.02


Or, the median charges for both sexes in each region?:

df.groupby(['region', 'sex'])['charges'].median().round(2)

>>region     sex   
northeast  female    10197.77
           male       9957.72
northwest  female     9614.07
           male       8413.46
southeast  female     8582.30
           male       9504.31
southwest  female     8530.84
           male       9391.35
>>
Or, the number of men & women in each region?

df.groupby(['region', 'sex'])['sex'].count()

>>region     sex   
northeast  female    161
           male      163
northwest  female    164
           male      161
southeast  female    175
           male      189
southwest  female    162
           male      163
1 Like

Hi,

Thanks - some really good learning points there that I can definitely use! The median point is a good one, and I’ll def incorporate that.

I still can’t work out how to get it into 2 x 4 crosstab table though, although that’s more from the point of view of a data vis point of view, as I can’t do anything like Chi squared on it if it’s not frequencies anyway (I think!).

1 Like

No, you can’t do chi squared here.

Maybe a viz? something like,

sns.barplot(x='sex', y='charges', hue='region', estimator= np.median, data=df)

1 Like

Really nice, thank you

1 Like