FAQ: Modifying DataFrames - Applying a Lambda to a Row

This community-built FAQ covers the “Applying a Lambda to a Row” exercise from the lesson “Modifying DataFrames”.

Paths and Courses
This exercise can be found in the following Codecademy content:

Data Science

Data Analysis with Pandas

FAQs on the exercise Applying a Lambda to a Row

Join the Discussion. Help a fellow learner on their journey.

Ask or answer a question about this exercise by clicking reply (reply) below!

Agree with a comment or answer? Like (like) to up-vote the contribution!

Need broader help or resources? Head here.

Looking for motivation to keep learning? Join our wider discussions.

Learn more about how to use this guide.

Found a bug? Report it!

Have a question about your account or billing? Reach out to our customer support team!

None of the above? Find out where to ask other questions here!

When do I use row[‘hourly_wage’] to access a cell’s data vs row.hourly_wage?

2 Likes

I have the same question. I am also curious as to why we need to include “axis = 1” when that was not necessary in the previous exercise when we created a new column with:

df[‘last name’] = df.name.apply(lamba x:x.split (’’)[-1])

3 Likes

I believe the axis is default to 0 columns hence it does not require an axis be specified

1 Like

import codecademylib
import pandas as pd
Is there a way to alias the Bolded parts of code below? thanks in advance!

df = pd.read_csv(‘employees.csv’)
#print(df)
total_earned = lambda row: **row[‘hours_worked’]***row[‘hourly_wage’] if row[‘hours_worked’] <= 40 else row[‘hourly_wage’]*40 +(row[‘hours_worked’]-40)*row[‘hourly_wage’]*1.5
df[‘total_earned’] = df.apply(total_earned, axis=1)
print(df)

What are the backslashes in the solution?

When you show the solution, you should be able to go back and look at the code you attempted. View Solution deletes your code and you can’t see what you did wrong. Also, View Solution should only do one step at a time.

7 Likes

I get an invalid syntax error every single time I try to enter my code for the total_error lambda function, even when I just copy the solution. What is going on? It says you don’t need the backslashes but they seem required. This has happened on the last 3 exercises for this lesson.

This is my code so far:
total_earned = lambda row: (row.hourly_wage * 40) + ((row.hourly_wage * 1.5) * (row.hours_worked - 40))
if row.hours_worked > 40
else row.hourly_wage * row.hours_worked

2 Likes

I was wondering the same thing about the backslashes. They do not appear in any of the examples in the lessons and they don’t make sense to me.

5 Likes

The backslash is an “escape” character. I think that they wanted to put a newline in the code, probably for readability, but they didn’t want the newline to affect the code itself, so they used the backslash to “escape” the effect of the newline.

I found this in Stack Overflow regarding the backslash:
“A backslash at the end of a line tells Python to extend the current logical line over across to the next physical line.”
Source: https://stackoverflow.com/questions/38125328/what-does-a-backslash-by-itself-mean-in-python

4 Likes

My question is: can someone please expand on the use of the ‘row’ syntax (row.column_name / row[‘column_name’]).

Is ‘row’ a function / keyword? What does it mean? Why can we use it to access specific values, when previously we had to use ‘iloc’ or ‘loc’?

I think it is because you have whitespace / newlines before “if” and before “else” in your code which is interrupting the logic. I tried your code that you pasted and it gave me a syntax error. I deleted the whitespace/newlines and it worked.

df[‘new_column’] = df.apply(some function, axis=1) says that you apply function to a row as a whole.
It’s not necessary to write “row” here: function lambda row: row[‘price’]+10 is the same as lambda x: x[‘price’]+10 or whatever else name of parameter: row/x will be changed by each row of a database as an argument.

But I found an interesting moment: row[‘column’] and row.column is not the same if column is string type: the second is fine for numbers but doesn’t work with strings correctly:

apply

Besides we can apply a function not only to a row as a whole but and to a column as a whole if we write axis=0:

4 Likes

I had the same issue

The problem is the indentation. It’s necessary the space before the if and else.

total_earned = lambda row: 
(row.hourly_wage * 40) + ((row.hourly_wage * 1.5) * (row.hours_worked - 40)) \
	if row.hours_worked > 40 \
  else row.hourly_wage * row.hours_worked
  
df['total_earned'] = df.apply(total_earned, axis = 1)

print(df)```

I think not everyone works 40h either. So we can’t hard code the *40 in (row.hourly_wage * 40) + ((row.hourly_wage * 1.5) * (row.hours_worked - 40))

It would be helpful to get an answer to this question.

It seems I cannot just transform the given solution [spoiler]

total_earned = lambda row: (row.hourly_wage * 40) + ((row.hourly_wage * 1.5) * (row.hours_worked - 40))
if row.hours_worked > 40
else row.hourly_wage * row.hours_worked

Into
total_earned = lambda row: (row[‘hourly_wage’] * 40) + ((row[‘hourly_wage’] * 1.5) * (row[‘hours_worked’] - 40))
if row[‘hours_worked’] > 40
else row[‘hourly_wage’] * row[‘hours_worked’]

Why? Is it because the axis=1 is missing? If I add it, it doesn’t work either…

2 Likes

Now I see that the reason of this error is not a data type but a column’s name “name” - it seems that Pandas takes name in row.name as a column of indexes:
name

1 Like

It seems that a few other people had trouble with the multi-line lambda in this exercise. I’ve found two solutions:

Either use backslash: “” After every line like so:

Or put the lines within parenthesis like so:

What is the purpose of the \ ?

I don’t recall seeing it in any lesson relating to lambda or if/else before this. Every lambda or if/else up until now has not required it, but in this lesson it is. Is it because we’re using Pandas? Is it a Pandas or table exclusive requirement?

The purpose of backslashes is joining lines. See this Python reference about explicit and implicit line joining (2.1.5 and 2.1.6) if you like.

For example, if we remove all backslashes from the first code in the post of @sionchen, we will get a SyntaxError. To avoid this, we need to do some line joining, or make everything into one line without any line breaks.

It’s because the way Python works, you can’t just line break anywhere within a line of code for it to work. When the line of code is too long and you want to continue on a new line for better readability, you use \

So as far as I know, it doesn’t matter whether you’re using lambda, pandas, whatever. You can use \ if you want to write one statement with multiple lines.

1 Like