I’ve really appreciate some help. I’m working with the observations data for the Biodiversity portfolio project. It has 3 columns: scientific_name, park_name, and observations.
I’m trying to groupby park name to get sum of observations for all species in that park.:
park_grouped_obs = observation_data.groupby(['park_name']).sum()
However the dataframe I end up with only has 1 column: observations. (Index([‘observations’], dtype=‘object’))
It looks right when I print it:
But since I can’t work with the park_name column I’m finding it difficult to make graphs.
Nevermind, realised i needed to .reset_index()
also found a solution using as_index = False
What is the difference between using either solution?
groupby() from the Pandas docs:
The default for the parameter,
as_index= is True which means it returns the df object with the labels as indexes.
What was your code (don’t put it in a codebyte, just format it) for using
reset_index()? The default for the parameter
in_place is set to False which makes a copy of the df rather than changing the original df. When you use
inplace =True it modifies the original df and returns None.
It looks like you tried this(?):
numbersObs = observations.groupby('park_name').observations.sum().reset_index()
>> park_name observations
0 Bryce National Park 576025
1 Great Smoky Mountains National Park 431820
2 Yellowstone National Park 1443562
3 Yosemite National Park 863332
You can use matplotlib (or Seaborn) to make a viz of this. Just use park_name as the X axis and observations as the Y axis.