Why groupby seems to group only partially

:slight_smile: i

I have a dataframe “df1” that looks as follows. And I want to know the count of each distinct “Concated-Both_Fine-Alternatives” combination.

df1:

Both_Fine Freq Alternatives Concated
n 1 FALSE aa
n 1 FALSE aa
n 1 FALSE aa
n 1 FALSE aa
n 1 FALSE aa
NaN 1 FALSE aa
n 1 FALSE aa
n 1 FALSE aa
n 1 FALSE aa
n 1 FALSE aa
n 1 FALSE bb
n 1 FALSE bb
n 2 FALSE bb

Therefore I created df2 using “groupby”. Below is my code.

df2=df1.groupby(['Concated','Both_Fine','Alternatives'])
agg({'Freq':'sum'}).reset_index()

Below is the output. I don’t understand why the Freq values of “aa” and “bb” aren’t aggregated (expected them to be 9 and 4, in a single row):

Concated Both_Fine Alternatives Freq
aa n FALSE 2
aa n FALSE 7
bb n FALSE 1
bb n FALSE 3

I have used Python to make sure that the “aa” in the first and second row of the output are the same, and “bb” in the third and fourth row are the same. But aren’t 100 percent sure.

Thanks for reading this.:slight_smile:

Couldn’t you just use .value_counts() on the columns?
Ex:

df["Concated"].value_counts()

>>whatever the total is for aa and bb outputs here

Unless you need to create a new df for some other reason(?)…or I’m not fully understanding the df.

:frowning: i Lisa,

Thanks for replying.:slight_smile:

Yeah, I want to create a new df where I can see the values of “Both_Fine”, “Alternatives”, “Freq” corresponding to each unique value in “Concate.” So simply returning the counts of “Concate” values won’t do the job.

1 Like