Biodiversity project (Data science foundations) help using groupby to create new dataframe

Hi all,

I’ve really appreciate some help. I’m working with the observations data for the Biodiversity portfolio project. It has 3 columns: scientific_name, park_name, and observations.
I’m trying to groupby park name to get sum of observations for all species in that park.:

park_grouped_obs = observation_data.groupby(['park_name']).sum()

However the dataframe I end up with only has 1 column: observations. (Index([‘observations’], dtype=‘object’))
It looks right when I print it:

But since I can’t work with the park_name column I’m finding it difficult to make graphs.
Please help!

Nevermind, realised i needed to .reset_index()
also found a solution using as_index = False
What is the difference between using either solution?

Parameters for groupby() from the Pandas docs:

The default for the parameter, as_index= is True which means it returns the df object with the labels as indexes.


What was your code (don’t put it in a codebyte, just format it) for using reset_index()? The default for the parameter in_place is set to False which makes a copy of the df rather than changing the original df. When you use inplace =True it modifies the original df and returns None.

It looks like you tried this(?):

numbersObs = observations.groupby('park_name').observations.sum().reset_index()

>>                              park_name  observations
0                  Bryce National Park        576025
1  Great Smoky Mountains National Park        431820
2            Yellowstone National Park       1443562
3               Yosemite National Park        863332

You can use matplotlib (or Seaborn) to make a viz of this. Just use park_name as the X axis and observations as the Y axis.