When using iloc to select ranges of Pandas dataframe rows, can we skip rows?

Question

When using iloc to select ranges of Pandas dataframe rows, can we skip rows? For instance, can we choose to select every second or third row only?

Answer

You can! Selecting multiple rows using .iloc is very similar to list slicing in Python. There are a few ways to select rows using iloc.

To select just a single row, we pass in a single value, the index. For example with Python lists,
numbers[0] # First element of numbers list

And with Dataframes, we would do something similar,
orders.iloc[0].

Selecting a range of elements of a list and a range of rows in Pandas is also very similar.

# Python list
numbers[3:7]

# Pandas
orders.iloc[3:7]

To skip a certain number of indexes per index, we can include a third, step, value.

# This selects values at indexes 0, 3, 6, 9.
# Python list
numbers[0:10:3]

# This selects rows at indexes 0, 3, 6, 9
# Pandas
orders.iloc[0:10:3]
9 Likes

Another way to do so is: orders.iloc[[0, 2, 4, 6]] which will return only even rows

16 Likes

Is there a way to return the last two rows and the second row all within iloc? I’ve got the below data from the example

import codecademylib
import pandas as pd

df = pd.DataFrame([
  ['January', 100, 100, 23, 100],
  ['February', 51, 45, 145, 45],
  ['March', 81, 96, 65, 96],
  ['April', 80, 80, 54, 180],
  ['May', 51, 54, 54, 154],
  ['June', 112, 109, 79, 129]],
  columns=['month', 'clinic_east',
           'clinic_north', 'clinic_south',
           'clinic_west']
)
april_may_june = df.iloc[-2:]
2 Likes

april_may_june = df.iloc[[-2, -1 ,1]]

2 Likes

I tried this, but I am not getting the answer as expected. I got last two rows.

may_june_february = df.iloc[[4:,1]]
OR
may_june_february = df.iloc[[4,5,1]]

1 Like

or if you want all even rows and are too lazy too type them all:

#use list slicing
df.iloc[::2]
4 Likes

The pandas docs mention a way to do this using a lambda function.

this returns only the even rows

df.iloc[lambda x: x.index % 2 == 0]

and if you wanted odds

df.iloc[lambda x: x.index % 2 == 1]

Pandas Documentation on .iloc()

8 Likes

you can try the syntax below

april_may_june = df.iloc[-3:]

This syntax will select the last 3 rows of dataframe

for selecting the last two rows only needs
df.iloc[[-2, -1]]

1 Like

what if i want to ignore the 0th row. when i added as x.index % 2 == 0 and x.index!=0 it is not working

yes we can
numbers[::3]

df.iloc[lambda x: (x.index % 2 == 0) & (x.index != 0)] would work. Change your logical operator from ‘and’ to ‘&’.

Super interesting. I wonder why that is. I had initially thought it’s the issue with parenthesis but somehow & fixes it over and. I can’t seem to find the documentation for it though.

Update: Found it couple of lessons onwards.

https://www.codecademy.com/journeys/data-scientist-ml/paths/dsmlcj-22-data-science-foundations-ii/tracks/dsmlcj-22-pandas-for-data-science/modules/dsf-hands-on-with-pandas-9d5a2e88-c68f-42d2-9e57-dd06b4ebf183/lessons/pandas-i/exercises/select-rows-logic-ii

Apparently in Pandas | means “or” and & means “and” and you have to use those operators because the Pandas objects like series and dataframes do not have a boolean value. **and ** and or are boolean operators, which is too ambiguous to be used here. What we need are element-wise logical operators which are the | and &.