Hi all,
I’ve really appreciate some help. I’m working with the observations data for the Biodiversity portfolio project. It has 3 columns: scientific_name, park_name, and observations.
I’m trying to groupby park name to get sum of observations for all species in that park.:
park_grouped_obs = observation_data.groupby(['park_name']).sum()
However the dataframe I end up with only has 1 column: observations. (Index([‘observations’], dtype=‘object’))
It looks right when I print it:

But since I can’t work with the park_name column I’m finding it difficult to make graphs.
Please help!
Nevermind, realised i needed to .reset_index()
also found a solution using as_index = False
What is the difference between using either solution?
Parameters for groupby()
from the Pandas docs:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html
The default for the parameter, as_index=
is True which means it returns the df object with the labels as indexes.
and,
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.reset_index.html
What was your code (don’t put it in a codebyte, just format it) for using reset_index()
? The default for the parameter in_place
is set to False which makes a copy of the df rather than changing the original df. When you use inplace =True
it modifies the original df and returns None.
It looks like you tried this(?):
Summary
numbersObs = observations.groupby('park_name').observations.sum().reset_index()
print(numbersObs)
>> park_name observations
0 Bryce National Park 576025
1 Great Smoky Mountains National Park 431820
2 Yellowstone National Park 1443562
3 Yosemite National Park 863332
You can use matplotlib (or Seaborn) to make a viz of this. Just use park_name as the X axis and observations as the Y axis.