FAQ: Aggregates in Pandas - Pivot Tables

This community-built FAQ covers the “Pivot Tables” exercise from the lesson “Aggregates in Pandas”.

Paths and Courses
This exercise can be found in the following Codecademy content:

Data Science

Data Analysis with Pandas

FAQs on the exercise Pivot Tables

There are currently no frequently asked questions associated with this exercise – that’s where you come in! You can contribute to this section by offering your own questions, answers, or clarifications on this exercise. Ask or answer a question by clicking reply (reply) below.

If you’ve had an “aha” moment about the concepts, formatting, syntax, or anything else with this exercise, consider sharing those insights! Teaching others and answering their questions is one of the best ways to learn and stay sharp.

Join the Discussion. Help a fellow learner on their journey.

Ask or answer a question about this exercise by clicking reply (reply) below!

Agree with a comment or answer? Like (like) to up-vote the contribution!

Need broader help or resources? Head here.

Looking for motivation to keep learning? Join our wider discussions.

Learn more about how to use this guide.

Found a bug? Report it!

Have a question about your account or billing? Reach out to our customer support team!

None of the above? Find out where to ask other questions here!

Why did the pivot table change the id count from an int to a float? Is there any way to preserve the data type or did I do something to change it?

4 Likes

I have the same question… Any answers?

OK. It’s not intuitive.

Code that works:
shoe_counts_pivot = shoe_counts.pivot(columns=‘shoe_color’, index=‘shoe_type’,values=(‘id’)).fillna(0).astype(int).reset_index()

3 Likes

Why we don’t need to use count() for id in pivot function?

3 Likes

Hi,

This is my first response post so please take what I say with a pinch of salt!

I think:

.count() method has already been applied in the earlier groupby() statement so the Dataframe stored as shoe_counts has a column named ‘id’ with values generated by id.count()
Therefore no need to apply .count() again.

Hoep this helps

Tom

5 Likes

That’s what I think but could any of the admins/experts verify this fact…Thanks!

This may seem like a dumb question, but how are you supposed to know which columns go into the ‘column’, ‘index’ and ‘value’ elements of a pivot table. The lesson doesn’t do a very good job of explaining and I don’t understand why “shoe_type” and “shoe_color” are not interchangeable when creating the new Dataframe

1 Like

They are interchangeable, it’s just a matter of how you what your data to be displayed some ways are just more useful or clear than others.

Because our main focus is on shoe types and how well they’re selling depending on their color, it’s better if shoe_type is used as our reference column (or index) and shoe_color is used as the columns you will compare. Since we’re used to read from left to right, it just provides a better comparison and understanding of our data.

Hope it helps, cheers!

2 Likes

Actually, I think it isn’t the same id. You’re selecting id from the latter table, in which count is already applied, and doesn’t have anything to do with the previous table.

2 Likes

Here is an quote:

The values should be id , the column that actually represents the counts of each shoe_type / shoe_color combination.

Ok, I got that id is just number of crossing two tables, but is “id” a python key word or is it id from our table ? Im little bit confusing…

what does the reset_index() do at the end of shoe_counts_pivot ? I noticed that without it shoe_type( before black) wasn’t outputed. Smbody plz explain

I guess you could say reset_index() pushes the leftmost column up one column to the right so the column name is visible. So in this case, the ‘shoe_type’ column name was hidden and when we use reset_index, it pushes it one space to the right and now we can see it. That’s how I understood it

I get that shoe_type is the index part of the pivot(). What would we call the numbers 0-3 at the far left? How do we get rid of them?

Hi there!
The shoe_type column becomes our index, yes. But when we print the dataframe without the reset_index() method, it’s not visibly obvious that the index column is supposed to be a shoe type. By using the reset_index() method, we push the type column to the right so that the column name shows up.
Plus, having a clear numerical index can be very helpful if the pivot table has hundreds of elements!

Try printing the dataframe with and without the reset_index() method to better understand it! I hope I was helpful :slight_smile:

To add to this, the numerical index is actually useful if we use methods like .iloc to locate and select specific rows based on the numerical index. You would also use them for slicing methods like [x:y]. You don’t want to get rid of them as they serve a useful purpose.