FAQ: Modifying DataFrames - Applying a Lambda to a Row

I too got confused here because both the example and the hint use row[‘row_name’] while the solution presented uses row.row_name . In addition to the backslashes which would’ve been nice to mention at some point.

Would also second that “View Solution” should a) only do one step at a time (in this case it was only two steps, but when you get stuck on step 3 of 12, it’s really, really annoying to have to choose between nothing or everything) and b) allow reverting to one’s own code to see what one actually did wrong. Or better yet, open in a new window and not overwrite the workspace.

2 Likes

I just did it like this, and it worked, not sure why the example is so complicated.

totalhours = lambda x: 40 + (x - 40) * 1.50 if x > 40 else x

df[“Total Salary”] = df[“hourly_wage”]*df[“hours_worked”].apply(totalhours)

2 Likes

I thought the purpose of a lambda function was to write a small function on a single line. If we have to write a more complex function like sample in the lesson

df['Price with Tax'] = df.apply(lambda row:
     row['Price'] * 1.075
     if row['Is taxed?'] == 'Yes'
     else row['Price'],
     axis=1
)

Why are we still using a lambda for multi-line instead of a regular function? What is the benefit? I know its broken up for readability, and could be a single line so I guess it comes down to why lambda instead of just function?

Hi all,
For Q9 in the second lesson of the panda section (applying the lambda to row function) I wrote the lambda function exactly as the solution but without the “backslash”. Why is this required?

code (I wrote the same but without the backslash, was seen as a mistake by codecademy):
total_earned = lambda row: ((1.5 * row.hourly_wage) * (row.hours_worked - 40) + (row.hourly_wage * 40))
if row.hours_worked > 40
else row.hourly_wage * row.hours_worked

Why did solution 1. works and solutions 2. didn’t?

df[‘total_earned’] = df.apply(lambda x: x[‘hours_worked’] * x[‘hourly_wage’] if x[‘hours_worked’] <= 40 else 40*x[‘hourly_wage’] + (x[‘hours_worked’]-40)1.5x[‘hourly_wage’], axis=1)

total_earned = lambda x: x[‘hours_worked’] * x[‘hourly_wage’] if x[‘hours_worked’] <= 40 else 40*x[‘hourly_wage’] + (x[‘hours_worked’]-40)1.5x[‘hourly_wage’], axis=1

df[‘total_earned’] = df.apply(total_earned)

It keep giving me syntax error( in the practise, I cannot figure out why, can anybody help.

import codecademylib import pandas as pd df = pd.read_csv('employees.csv') total_earned = lambda row:row['hourly_wage'] * row['hours_worked'] if row['hours_worked'] <= 40 else row['hourly_wage'] * 40 + (row['hours_worked'] - 40) * row['hourly_wage'] * 1.5

There are two line breaks in the ternary operator (just before and after else), which should be one line instead of three lines. Or you can use explicit line joining, like:

total_earned = lambda row:row['hourly_wage'] * row['hours_worked'] if row['hours_worked'] <= 40  \
else \
  row['hourly_wage'] * 40 + (row['hours_worked'] - 40) * row['hourly_wage'] * 1.5
2 Likes

Considering the size and complexity along with fact you’ve named the new function I think it’s well worth considering an actual function here since there’s no obvious benefit to using lamba at that point. If the lesson requires lambda then so be it but they’re generally reserved for simpler tasks (where they can actually remain anonymous).

So I have the following questions:

  1. when adding a column into a DataFrame, is it possible to use normal function inside the df.apply(), or use the normal function with lambda, like this:
    df[“side”] = df.apply (lambda area, shape : side_calculator(area, shape), axis = 1 )
    side_calcuator is already defined, area and shapes are columns of the DataFrame, When I ran this, the error message is:
    () missing 1 required positional argument: ‘shape’

  2. can I use else if in lambda function because I have more than 2 different sets of conditions.

Yes passing regular functions is perfectly valid. In Python the only difference would be that lambda is anonymous (no bound name). You can also call other functions within a lambda if you like and therefore it can be used with .apply but consider .apply as more of a last resort, the operations will be significantly slower than any vectorised pandas tools. In fact it’s likely to come in around roughly the same time as iteration since that’s what it’s doing. Using a loop might even be preferable here to nesting functions inside lambdas and trying to force the arguments into place, I’d choose a clear and readable for loop over a confusing .apply().

I’d suggest a look through the following which go into a lot of detail about when .apply is useful.

So .apply can be useful, just try not to overuse it.

You can make your example work but you’d need to change a few things. At the moment what are the arguments for size and shape going to be for a data series? You’re only going to be dealing with a single argument.

It’s a little awkward but if you use a frame instead of a series you can access it by column or by row-

df = pd.DataFrame(
    [[1, 2], [3, 4], [5, 6]],
    columns=["A","B"]
)
df["C"] = df[["A", "B"]].apply(lambda row: row[0] + row[1], axis=1)
# axis argument means we're accessing each row in turn
# equally you could write
df["E"] = df.apply(lambda row: row["A"] + row["B"], axis=1)
# because each row is provided as a Series object (like .iloc does)

This function would be adding element 0 and element 1 from each row together to make a new element in a new column "C". You could equally pass something like the indexed data frame[0] and frame[1] to your function.

df["new"] = df.apply(lambda row: func(row["shape"], row["size"]), axis=1)

For something more complicated than that I’d consider passing a regular function, there’s already quite a lot going on in that line.

The frame itself would then look like the following where "C" is just the sum of the elements before it-

A B C
0 1 2 3
1 3 4 7
2 5 6 11

I included this as an example though the vectorisation option is obvious in this case, df["A"] + df["B"]. As a compromise a list comprehension might be an option, e.g. df["D"] = [a + b for a, b in zip(df["A"], df["B"])].

1 Like

It really works but I’m confused. The function takes hours_worked as x apparently. How does it know that x is hours_worked rather than hourly_wage?

ok. got it. :slight_smile:
it is applied here (df[“hours_worked”].apply(totalhours))