FAQ: Aggregates in Pandas - Calculating Aggregate Functions II

This community-built FAQ covers the “Calculating Aggregate Functions II” exercise from the lesson “Aggregates in Pandas”.

Paths and Courses
This exercise can be found in the following Codecademy content:

Data Science

Data Analysis with Pandas

FAQs on the exercise Calculating Aggregate Functions II

There are currently no frequently asked questions associated with this exercise – that’s where you come in! You can contribute to this section by offering your own questions, answers, or clarifications on this exercise. Ask or answer a question by clicking reply (reply) below.

If you’ve had an “aha” moment about the concepts, formatting, syntax, or anything else with this exercise, consider sharing those insights! Teaching others and answering their questions is one of the best ways to learn and stay sharp.

Join the Discussion. Help a fellow learner on their journey.

Ask or answer a question about this exercise by clicking reply (reply) below!

Agree with a comment or answer? Like (like) to up-vote the contribution!

Need broader help or resources? Head here.

Looking for motivation to keep learning? Join our wider discussions.

Learn more about how to use this guide.

Found a bug? Report it!

Have a question about your account or billing? Reach out to our customer support team!

None of the above? Find out where to ask other questions here!

In exercise - https://www.codecademy.com/paths/analyze-data-with-python/tracks/ida-4-data-manipulation-pandas/modules/ida-4-2-aggregates-in-pandas/lessons/pandas-aggregates/exercises/groupby-ii
for the teas id count example- the code is mentioned as
teas_counts = teas.groupby(‘category’).id.count().reset_index()
Can we also write it as -
teas.groupby(‘category’).nunique().reset_index()
If not, why?
Thanks in advance!

what’s difference between dataframe and series?

I knew series consists of only one colomn and dataframe more than 1.

But in this excercise,

<<import codecademylib

import pandas as pd

orders = pd.read_csv(‘orders.csv’)

pricey_shoes = orders.groupby(‘shoe_type’).price.max().reset_index()

print(pricey_shoes)

print(type(pricey_shoes))>>

before I use ‘.reset.index()’, ‘pricey_shoes’ was ‘series.’ But it had more than one columns.(shoe_type, price). was it a real series?

thank u for answering

1 Like

I believe series is kinda a list and only can contain one index, whilst a dataframe is like a collection of series?

Hey, here at AGGREGATES IN PANDAS: Calculating Aggregate Functions II is code example:

teas_counts = teas.groupby('category').id.count().reset_index()

why we use id to count of quantity ? every category (black, green) has different id, how does it work ?

In the example they mention, they use

teas_counts = teas.groupby('category').id.count().reset_index()

to find the number of each category of tea they sell.

I don’t understant how this code works if they’re using the ‘id’ column, instead of the column ‘name’. Is the ‘id’ column a real column or just the index column? And if it’s a real DataFrame column, how does it works to count the different types of teas they have if the column contains just integers from 0 to the total number of teas? I mean, these integers never repeat, so how they count using this column?

Instead of the code they use, woudn’t be more clear to use:

teas_counts = teas.groupby('category').tea.count().reset_index()

?

In this example id is the name of a column. Pandas has a convenience option to use the . attribute syntax for column names in most circumstances (but not all).

In this case teas.groupby('category').id == teas.groupby('category')['id'] or more generally teas.id == teas['id']. Sometimes that syntax can be preferable or more readable, in other cases it may be more confusing.

Might be useful reading at this Q/A on the . dotted attribute syntax in pandas-

1 Like

So I see in the example they have this written

teas_counts = teas.groupby('category').id.count().reset_index()

but would this give you a similar answer?

teas_counts = teas.groupby('category').category.count().reset_index()

The count is probably the same but the column names would be the issue. In your first example you’d get something like-

    category  id
0      black  70
2      green  33
3      white  13

Note that the second column is titled ['id']. In the other route I think you’d wind up with two ['category'] columns or it just wouldn’t work, if you get the chance test it to see what happens (if this is just example data then try it on another dataframe you do have).

1 Like