In the context of this exercise, in Pandas, when do we apply lambda functions to rows as opposed to columns of a dataframe?
Generally, we apply a lambda to rows, as opposed to columns, when we want to perform functionality that needs to access more than one column at a time.
Take for instance, the example function from the exercise:
lambda row: row['Price'] * 1.075 if row['Is taxed?'] == 'Yes' else row['Price']
As we can see, this lambda function is accessing multiple columns of the dataframe:
Is taxed?. Because it is accessing multiple columns, it would need to be able to access the entire row, instead of just a single column.
On the other hand, when applying a lambda function to a single column, the lambda will only apply to that column’s values. For example, from the previous exercise example:
df['Email Provider'] = df.Email.apply(lambda x: x.split('@')[-1] )
will apply the lambda function only on the values of the column
df.Email, and not to any other columns.