Capstone Biodiversity: Filtering out a groupby row

Hi there,
Im working on the project Capstone Project: Biodiversity in National Parks

My question is, how do you filter out rows when using groupby? My code:

conservation_status_group = species\
.groupby('conservation_status')\
.scientific_name.count()\
.reset_index()\
.sort_values(by='scientific_name')
print(conservation_status_group)

produces this table:

  conservation_status  scientific_name
1         In Recovery                4
4          Threatened               10
0          Endangered               16
3  Species of Concern              161
2     No intervention             5633

And I would like to filter out scientific_name where there are more than 200 unique values (i.e. ‘No intervention’). So, Ive been trying to add add a filter using .filter(lambda x: len(x) < 200) based on the Pandas docs: https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html#filtration

conservation_status_group = species\
.groupby('conservation_status')\
.scientific_name.count()\
.filter(lambda x: len(x) < 200)\
.reset_index()\
.sort_values(by='scientific_name')
print(conservation_status_group)

However, I get the error: TypeError: ‘function’ object is not iterable

How can I filter out a row from a groupby?

Thanks, Roger

Hi Roger, Welcome to the forums!

I did this project a couple years ago. :slight_smile:

If you back up a couple steps, I think rather than use .count() maybe you want to try to see how many unique values there are instead.
The directions say, " What are the different values of category in species ?" and, " What are the different values of conservation_status ?"
So, that’d be, species.category.unique() and the same function applied to conservation_status.
Also, take a look at .nunique() and what it does.
https://www.geeksforgeeks.org/python-pandas-dataframe-nunique/

I think the directions also state how .groupby() doesn’t count NAN or None values, so you have to fill those in using a particular method. You can pass in whatever value you want to fill in the “None” values as an argument.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.fillna.html

I think that you want to include those species/scientific_name that don’t require intervention. So, it’s not necessary to use a lambda function in your .groupby(). Comment out the line that starts with “.filter()” and print it to see what happens.

Hi @lisalisaj and others, indeed I could use nunique and it does the same thing in this case. Unfortunately you werent able to address my question though.

protection_counts = species\
.groupby('conservation_status')\
.scientific_name.nunique()\
.reset_index()\
.sort_values(by='scientific_name')
print(protection_counts) 

returns

  conservation_status  scientific_name
1         In Recovery                4
4          Threatened               10
0          Endangered               15
3  Species of Concern              151
2     No Intervention             5363

But the question remains: How to filter out values from a groupby table? Its not important whether or not I want to filter out ‘No intervention’ specifically, its important that I can filter out values on any condition.

For example, say I want to filter out rows so that only ‘In recovery’, ‘Threatened’ and ‘Endangered’ remain, using say a filter with scientific_name < 20. I shouldnt have to bring into Excel to filter out… I would like this:

  conservation_status  scientific_name
1         In Recovery                4
4          Threatened               10
0          Endangered               15

Thanks in advance

Ah, ok. I misunderstood. I was just following what was asked/required in the capstone project.
I honestly don’t know how to filter out while using .groupby()
I tried a few things on my end and it didn’t work out.
https://stackoverflow.com/questions/27488080/python-pandas-filter-rows-after-groupby
I mean, you could just create a new df and omit no_intervention. :woman_shrugging:t2:

Perhaps someone else can chime in & help.

Are you on Discord? Go to the CC server and python channel. I bet someone there knows the answer. :slight_smile: