FAQ: Creating, Loading, and Selecting Data with Pandas - Select Rows with Logic I

This community-built FAQ covers the “Select Rows with Logic I” exercise from the lesson “Creating, Loading, and Selecting Data with Pandas”.

Paths and Courses
This exercise can be found in the following Codecademy content:

Data Science

Data Analysis with Pandas

FAQs on the exercise Select Rows with Logic I

Join the Discussion. Help a fellow learner on their journey.

Ask or answer a question about this exercise by clicking reply (reply) below!

Agree with a comment or answer? Like (like) to up-vote the contribution!

Need broader help or resources? Head here.

Looking for motivation to keep learning? Join our wider discussions.

Learn more about how to use this guide.

Found a bug? Report it!

Have a question about your account or billing? Reach out to our customer support team!

None of the above? Find out where to ask other questions here!

Someone help me out here. I don’t see the difference between the isin() command and selecting rows using iloc.

I used both on the exercise and the results were just the same.

Same question here…
From the exercise, these two will return exactly the same thing:

import codecademylib
import pandas as pd

df = pd.DataFrame([
  ['January', 100, 100, 23, 100],
  ['February', 51, 45, 145, 45],
  ['March', 81, 96, 65, 96],
  ['April', 80, 80, 54, 180],
  ['May', 51, 54, 54, 154],
  ['June', 112, 109, 79, 129]],
  columns=['month', 'clinic_east',
           'clinic_north', 'clinic_south',
           'clinic_west'])

january_february_march = df[df.month.isin(['January', 'February', 'March'])]

print(january_february_march)


new = df.iloc[:3]
print(new)

So what’s the difference between iloc and isin…
I also observed that the var new will work well even if I skipped the .iloc and just type df[:3]

So iloc is not logical , you have to know the position of the rows you want to extract.

With isin() you do not have to know the position, you will just filter for the condition that you want to extract, isin looks like is very useful when you have a lot of data (lets say 3k rows) and you just don’t have the time to know where are the rows that you want, you can filter with isin.

27 Likes

Apparently there are at least three ways of accomplishing the same task.

I have the code:
january=df[df.month == ‘January’]
But when I run it, it brings up a SyntaxError. Any idea why?

what about checking/retyping the apostrophe again?

I am doing a side project, there’s a column name ‘shape’, here’s my syntax:
df[df.shape == ‘triangle’].
system reports error because DataFrame actually has an attribute named shape. so how do I go around it?

While these do indeed provide the exact same rows in many cases you don’t have data this simple and structed. Imagine that you have 1000 rows where the values you are looking for a spread throughout the column in this case it would be very time consuming to find the index of the items you are looking for to get a slice of the data you what. In this case the isin() function can easily retrieve the values of interest without needing to inspect the underlying data and is often the way we prefer to retrieve the data.

Can you post the whole project or at least the part where you created your data frame because the error could come from anywhere.

1 Like

hi I,m nor very expert on this but I will try:

new_shape = df[df.shape.isin(['triangle'])

I see that instead of df[df.month.isin(['January', 'February', 'March'])] I can use the standard in operator, like df[df.month in ['January', 'February', 'March']] and I get the same results. So it’s not clear to me why we need the isin() function in Pandas.

1 Like

@ramtob I have heard that the isin method is much more streamlined, so an improved version of the standard in operator. This really only shows when the dataset you work with is large where the difference in processing time can matter.

2 Likes

@juniorchuang Sounds plausible.
I now also see that in the case of pandas isin() we can pass as a parameter not only lists, but also dictionaries, and even dataframes.

2 Likes

I copied it into my codeacademy workspace and I got the following message:
“The character U+2018 can be confused with the ASCII character U+0060, which is more common in source code.” The problem is indeed in the quotation marks.

I tried to compare 2 columns:

test = df[df.clinic_east == df.clinic_west]

print(test)

but I did not get what I wanted, e.g. false, I got all the columns.

Hi, try this

test = all(df.clinic_east == df.clinic_west)

Thanks! It does work!