FAQ: Aggregates in Pandas - Calculating Aggregate Functions I

This community-built FAQ covers the “Calculating Aggregate Functions I” exercise from the lesson “Aggregates in Pandas”.

Paths and Courses
This exercise can be found in the following Codecademy content:

Data Science

Data Analysis with Pandas

FAQs on the exercise Calculating Aggregate Functions I

There are currently no frequently asked questions associated with this exercise – that’s where you come in! You can contribute to this section by offering your own questions, answers, or clarifications on this exercise. Ask or answer a question by clicking reply (reply) below.

If you’ve had an “aha” moment about the concepts, formatting, syntax, or anything else with this exercise, consider sharing those insights! Teaching others and answering their questions is one of the best ways to learn and stay sharp.

Join the Discussion. Help a fellow learner on their journey.

Ask or answer a question about this exercise by clicking reply (reply) below!

Agree with a comment or answer? Like (like) to up-vote the contribution!

Need broader help or resources? Head here.

Looking for motivation to keep learning? Join our wider discussions.

Learn more about how to use this guide.

Found a bug? Report it!

Have a question about your account or billing? Reach out to our customer support team!

None of the above? Find out where to ask other questions here!

Hi, how should look like this loop script: “We want to get an average grade for each student across all assignments. We could do some sort of loop”. What is this loop? I have no idea.

Why does df.groupby('column1').column2.measurement() return a series data type and not a data frame?

2 Likes

The final question where we are determining the type of the object. Is the object a series? I must review my object types.

Additionally, I am having problems knowing when to use parenthesis verses brackets. I understand that brackets are for lists but in many of the functions the positioning can be confusing.

Thank You

I examined data types at several levels:

print(type(orders.groupby('shoe_type')))
# <class 'pandas.core.groupby.DataFrameGroupBy'>

print(type(orders.groupby('shoe_type').price))
# <class 'pandas.core.groupby.SeriesGroupBy'>

print(type(orders.groupby('shoe_type').price.max()))
# <class 'pandas.core.series.Series'>

print(type(orders.groupby('shoe_type').max()))
# <class 'pandas.core.frame.DataFrame'>

It seems that .groupby() method returns a DataFrameGroupBy object and df.groupby('column1').column2 returns a SeriesGroupBy object. Then .max() attribute of SeriesGroupBy object returns a series. On the other hand, .max() attribute of DataFrameGroupBy object returns a DataFrame. I don’t know why, but it seems to have such specifications.

If we want to convert the series pricey_shoes to a DataFrame, we can use pd.DataFrame():

print(type(pricey_shoes))
# <class 'pandas.core.series.Series'>

pricey_shoes_dataframe = pd.DataFrame(pricey_shoes)
print(type(pricey_shoes_dataframe))
# <class 'pandas.core.frame.DataFrame'>

In this case, it seems the shoe_type values become index of this new DataFrame.

Another way is introduced in the next exercise. By using .reset_index(), we can create a DataFrame with shoe_type added as a column.

2 Likes

what is the difference between a Panda series and a normal python array? when I print the type it says panda series and that got me a bit confuse. Also, why is not a data frame?

numpy array: You can think of it as a python list, but it has more useful function, like numerical operation, or being reshaped.

pandas series: similar to 1d numpy array, but it has additional functionality that allows values in the series to be indexed using label. (I use the explanation from the codecademy tutorial.)

dataframe: similar to series, but it is composed of multiple series.

More detailed info can be found here:
Introduction to Numpy and Pandas

1 Like