When do we apply lambda functions to rows as opposed to columns of a dataframe?


#1

Question

In the context of this exercise, in Pandas, when do we apply lambda functions to rows as opposed to columns of a dataframe?

Answer

Generally, we apply a lambda to rows, as opposed to columns, when we want to perform functionality that needs to access more than one column at a time.

Take for instance, the example function from the exercise:

lambda row: row['Price'] * 1.075 if row['Is taxed?'] == 'Yes' else row['Price']

As we can see, this lambda function is accessing multiple columns of the dataframe: Price and Is taxed?. Because it is accessing multiple columns, it would need to be able to access the entire row, instead of just a single column.

On the other hand, when applying a lambda function to a single column, the lambda will only apply to that column’s values. For example, from the previous exercise example:

df['Email Provider'] = df.Email.apply(lambda x: x.split('@')[-1] )

will apply the lambda function only on the values of the column df.Email, and not to any other columns.