When do we apply lambda functions to rows as opposed to columns of a dataframe?

Question

In the context of this exercise, in Pandas, when do we apply lambda functions to rows as opposed to columns of a dataframe?

Answer

Generally, we apply a lambda to rows, as opposed to columns, when we want to perform functionality that needs to access more than one column at a time.

Take for instance, the example function from the exercise:

lambda row: row['Price'] * 1.075 if row['Is taxed?'] == 'Yes' else row['Price']

As we can see, this lambda function is accessing multiple columns of the dataframe: Price and Is taxed?. Because it is accessing multiple columns, it would need to be able to access the entire row, instead of just a single column.

On the other hand, when applying a lambda function to a single column, the lambda will only apply to that column’s values. For example, from the previous exercise example:

df['Email Provider'] = df.Email.apply(lambda x: x.split('@')[-1] )

will apply the lambda function only on the values of the column df.Email, and not to any other columns.

4 Likes

you say

generally

, does this mean it is possible to apply lambda to columns when we used multiple columns or is it not possible at all?

1 Like

what is the use of axis = 1 in this exercise

6 Likes

You’ve told Python to calculate the value for every row in the table, and you need to ensure they are added to the dataframe as a new column. axis=1 basically means, “treat the list of values calculated as a column, not a row”

16 Likes

What’s the use of "\" on the code below:

total_earned = lambda row: (row.hourly_wage * 40) + ((row.hourly_wage * 1.5) * (row.hours_worked - 40)) \
	if row.hours_worked > 40 \
  else row.hourly_wage * row.hours_worked

Is it like an escape character?

3 Likes

The \ is used to join the following line of code with the preceding line to form a single line. You could also refer to it as a line continuation character. See the docs for more info.
So …

total_earned = lambda row: (row.hourly_wage * 40) + ((row.hourly_wage * 1.5) * (row.hours_worked - 40)) \
	if row.hours_worked > 40 \
  else row.hourly_wage * row.hours_worked

is interpreted as:

total_earned = lambda row: (row.hourly_wage * 40) + ((row.hourly_wage * 1.5) * (row.hours_worked - 40)) if row.hours_worked > 40 else row.hourly_wage * row.hours_worked
8 Likes

Now I get it. Thank you!

1 Like

Hi,
I get “row is not defined”… Dont understand why?
What is wrong with my code?

total_earned = lambda row: 40 + (row.hours_worked - 40) *1,5 \
  if row.hours_worked > 40 \
  else row.hours_worked * row.hourly_wage 

Hi! May I ask, for the first part of the exercise, why doesn’t my code work?

total_earned = lambda row :

row[‘hours_worked’] * row[‘hourly_wage’]

if row[‘hours_worked’] <= 40

else (40 * row[‘hourly_wage’]) + ((row[‘hours_worked’] - 40) * 1.5 * row[‘hourly_wage’])

It would mention ‘invalid syntax (’

I got the same error

ok, so I think I might have figured it out. I originally thought it was due to the formatting of using row[‘hourly_wage’] instead of using row.hourly_wage, but that is not the case.

when you are defining a function like you are doing, you need to use a line break at the end of each line (fig 1.a), otherwise, you need to write it all on the same “line”(fig 1.b);

but you don’t need one if you are creating a column and using the lambda function inside the new column (fig 2)

fig 1
total_earned = lambda row:
(row[‘hourly_wage’] * 40) +
((row[‘hours_worked’] - 40) * 1.5 * row[‘hourly_wage’])
if row[‘hours_worked’] > 40
else row[‘hours_worked’] * row[‘hourly_wage’]

fig 1.b
total_earned = lambda row: (row[‘hourly_wage’] * 40) + ((row[‘hours_worked’] - 40) * 1.5 * row[‘hourly_wage’]) if row[‘hours_worked’] > 40 else row[‘hours_worked’] * row[‘hourly_wage’]

fig 2
df[‘total_earned’] = df.apply(lambda row:
(row[‘hourly_wage’] * 40) +
((row[‘hours_worked’] - 40) * 1.5 * row[‘hourly_wage’])
if row [‘hours_worked’] > 40
else row[‘hours_worked’] * row[‘hourly_wage’],
axis=1)

maybe someone smarter than I can explain why…

1 Like

Sorry, I get what you mean but unfortunately, it doesn’t answer my question…

Oh I apologise, I misunderstood your explanation. Yes, for lambda functions, everything needs to be in the same line, otherwise, we would need to use the line break. My bad for that. Thank you for your help on that part! :slight_smile:

The strange thing is when I tried your code in fig 2, I still get the same error. However, when I change row[‘hourly_wage’] to row.hourly_wage and same for the row[‘hours_worked’], the code worked. Why is it? Wouldn’t both bring the same meaning?

I got a similar issue. At first I tried the code

df.total_earned = df.apply(total_earned, axis=1)

but that didn’t work. So I changed the format to

df[‘total_earned’] = df.apply(total_earned, axis=1)

and it worked. I don’t think df.total_earned and df[‘total_earned’] are interchangeable somehow

So the lecture reads:

If we use apply without specifying a single column and add the argument axis=1 , the input to our lambda function will be an entire row, not a column. To access particular values of the row, we use the syntax row.column_name or row[‘column_name’] .

Shouldn’t that be: “an entire column, not a row”? Since it was describing the case where you’re not using axis=1. I’m a bit confused.

Hi,

I got the same issue of ‘invalid syntax (’ even when I put everything in the same line. However, when I switched logic of the code and the order of the lines from “if row[‘hours_worked’] =< 40” to “if row[‘hours_worked’] > 40”, it worked.

So I am not sure if this is about the algorithm they define what is correct, in which you must use the same pattern that matches theirs, and I actually got it right in my first try. Or there’s really something I did wrong. Could someone help me out here? Also not sure about if there’s any rules on using () in lambda…

My codes are as followings.

first try and got ‘invalid syntax (’ :

total_earned = lambda row: row[‘hourly_wage’] * row[‘hours_worked’] if row[‘hours_worked’] =< 40 else row[‘hourly_wage’] * 40 + (row[‘hours_worked’] - 40) * row[‘hourly_wage’] * 1.5

second try and got it right:
total_earned = lambda row: row[‘hourly_wage’] * 40 + (row[‘hours_worked’] - 40) * row[‘hourly_wage’] * 1.5 if row[‘hours_worked’] > 40 else row[‘hourly_wage’] * row[‘hours_worked’]

I got same error. I tried other suggestions, but still dont get it why we need to seperate line.
Is there anyone who is still working on this?

I rewrote your first try using <= instead of =< and it worked for me. No idea if theres a rule surrounding this but it works…

total_earned = lambda row: row['hourly_wage'] * row['hours_worked']
if row['hours_worked'] <= 40 else (row['hourly_wage'] * 40) + (row['hours_worked'] - 40) * row['hourly_wage'] * 1.5

for future reference, when posting use the </> option to write your code!!

Hi yuena,

Thanks for your reply and I think it has to be written as <= then. I am glad at least it’s not something wrong with the logic or pattern otherwise it will be difficult to do it everytime.

I’m not sure if you already figured out why you were getting that error, but in your code you have 1,5 instead of 1.5.