This community-built FAQ covers the “Calculating Aggregate Functions I” exercise from the lesson “Aggregates in Pandas”.
Paths and Courses
This exercise can be found in the following Codecademy content:
Data Science
Data Analysis with Pandas
FAQs on the exercise Calculating Aggregate Functions I
There are currently no frequently asked questions associated with this exercise – that’s where you come in! You can contribute to this section by offering your own questions, answers, or clarifications on this exercise. Ask or answer a question by clicking reply (
) below.
If you’ve had an “aha” moment about the concepts, formatting, syntax, or anything else with this exercise, consider sharing those insights! Teaching others and answering their questions is one of the best ways to learn and stay sharp.
Join the Discussion. Help a fellow learner on their journey.
Ask or answer a question about this exercise by clicking reply (
) below!
Agree with a comment or answer? Like (
) to up-vote the contribution!
Need broader help or resources? Head here.
Looking for motivation to keep learning? Join our wider discussions.
Learn more about how to use this guide.
Found a bug? Report it!
Have a question about your account or billing? Reach out to our customer support team!
None of the above? Find out where to ask other questions here!
Hi, how should look like this loop script: “We want to get an average grade for each student across all assignments. We could do some sort of loop”. What is this loop? I have no idea.
Why does df.groupby('column1').column2.measurement()
return a series data type and not a data frame?
3 Likes
The final question where we are determining the type of the object. Is the object a series? I must review my object types.
Additionally, I am having problems knowing when to use parenthesis verses brackets. I understand that brackets are for lists but in many of the functions the positioning can be confusing.
Thank You
1 Like
I examined data types at several levels:
print(type(orders.groupby('shoe_type')))
# <class 'pandas.core.groupby.DataFrameGroupBy'>
print(type(orders.groupby('shoe_type').price))
# <class 'pandas.core.groupby.SeriesGroupBy'>
print(type(orders.groupby('shoe_type').price.max()))
# <class 'pandas.core.series.Series'>
print(type(orders.groupby('shoe_type').max()))
# <class 'pandas.core.frame.DataFrame'>
It seems that .groupby()
method returns a DataFrameGroupBy
object and df.groupby('column1').column2
returns a SeriesGroupBy
object. Then .max()
attribute of SeriesGroupBy
object returns a series. On the other hand, .max()
attribute of DataFrameGroupBy
object returns a DataFrame. I don’t know why, but it seems to have such specifications.
If we want to convert the series pricey_shoes
to a DataFrame, we can use pd.DataFrame()
:
print(type(pricey_shoes))
# <class 'pandas.core.series.Series'>
pricey_shoes_dataframe = pd.DataFrame(pricey_shoes)
print(type(pricey_shoes_dataframe))
# <class 'pandas.core.frame.DataFrame'>
In this case, it seems the shoe_type
values become index of this new DataFrame.
Another way is introduced in the next exercise. By using .reset_index()
, we can create a DataFrame with shoe_type
added as a column.
2 Likes
what is the difference between a Panda series and a normal python array? when I print the type it says panda series and that got me a bit confuse. Also, why is not a data frame?
numpy array: You can think of it as a python list, but it has more useful function, like numerical operation, or being reshaped.
pandas series: similar to 1d numpy array, but it has additional functionality that allows values in the series to be indexed using label. (I use the explanation from the codecademy tutorial.)
dataframe: similar to series, but it is composed of multiple series.
More detailed info can be found here:
Introduction to Numpy and Pandas
1 Like
Got curious based on the line in the lesson:
We want to get an average grade for each student across all assignments. We could do some sort of loop, but Pandas gives us a much easier option: the method .groupby
Here is my attempt on pulling the same data from the dataframe given using a loop, to demonstrate how much easier the groupby method works. The output in my case is a dictionary, but it could be easily converted to a series or a dataframe.
pricey_shoes_dict = {}
for index, row in orders.iterrows():
shoe_type = row["shoe_type"]
price = row["price"]
if shoe_type not in pricey_shoes_dict or price > pricey_shoes_dict[shoe_type]:
pricey_shoes_dict[shoe_type] = price
print(pricey_shoes_dict)
You would need to copy and paste it into the lesson to see it run properly in order for the orders.csv to be loaded.