Question
When using iloc
to select ranges of Pandas dataframe rows, can we skip rows? For instance, can we choose to select every second or third row only?
Answer
You can! Selecting multiple rows using .iloc
is very similar to list slicing in Python. There are a few ways to select rows using iloc
.
To select just a single row, we pass in a single value, the index. For example with Python lists,
numbers[0] # First element of numbers list
And with Dataframes, we would do something similar,
orders.iloc[0]
.
Selecting a range of elements of a list and a range of rows in Pandas is also very similar.
# Python list
numbers[3:7]
# Pandas
orders.iloc[3:7]
To skip a certain number of indexes per index, we can include a third, step, value.
# This selects values at indexes 0, 3, 6, 9.
# Python list
numbers[0:10:3]
# This selects rows at indexes 0, 3, 6, 9
# Pandas
orders.iloc[0:10:3]
9 Likes
vsenn
June 16, 2019, 6:10am
2
Another way to do so is: orders.iloc[[0, 2, 4, 6]]
which will return only even rows
16 Likes
Is there a way to return the last two rows and the second row all within iloc ? I’ve got the below data from the example
import codecademylib
import pandas as pd
df = pd.DataFrame([
['January', 100, 100, 23, 100],
['February', 51, 45, 145, 45],
['March', 81, 96, 65, 96],
['April', 80, 80, 54, 180],
['May', 51, 54, 54, 154],
['June', 112, 109, 79, 129]],
columns=['month', 'clinic_east',
'clinic_north', 'clinic_south',
'clinic_west']
)
april_may_june = df.iloc[-2:]
2 Likes
april_may_june = df.iloc[[-2, -1 ,1]]
2 Likes
I tried this, but I am not getting the answer as expected. I got last two rows.
may_june_february = df.iloc[[4:,1]]
OR
may_june_february = df.iloc[[4,5,1]]
1 Like
or if you want all even rows and are too lazy too type them all:
#use list slicing
df.iloc[::2]
4 Likes
The pandas docs mention a way to do this using a lambda function.
this returns only the even rows
df.iloc[lambda x: x.index % 2 == 0]
and if you wanted odds
df.iloc[lambda x: x.index % 2 == 1]
Pandas Documentation on .iloc()
8 Likes
you can try the syntax below
april_may_june = df.iloc[-3:]
This syntax will select the last 3 rows of dataframe
digital1288277484:
df.iloc[[-2, -1 ,1]]
for selecting the last two rows only needs
df.iloc[[-2, -1]]
1 Like
what if i want to ignore the 0th row. when i added as x.index % 2 == 0 and x.index!=0 it is not working
df.iloc[lambda x: (x.index % 2 == 0) & (x.index != 0)]
would work. Change your logical operator from ‘and’ to ‘&’.
Super interesting. I wonder why that is. I had initially thought it’s the issue with parenthesis but somehow & fixes it over and . I can’t seem to find the documentation for it though.
Update: Found it couple of lessons onwards.
https://www.codecademy.com/journeys/data-scientist-ml/paths/dsmlcj-22-data-science-foundations-ii/tracks/dsmlcj-22-pandas-for-data-science/modules/dsf-hands-on-with-pandas-9d5a2e88-c68f-42d2-9e57-dd06b4ebf183/lessons/pandas-i/exercises/select-rows-logic-ii
Apparently in Pandas |
means “or” and &
means “and” and you have to use those operators because the Pandas objects like series and dataframes do not have a boolean value. **and ** and or are boolean operators, which is too ambiguous to be used here. What we need are element-wise logical operators which are the | and &.