In the context of this exercise, in Pandas, when do we apply lambda functions to rows as opposed to columns of a dataframe?
Answer
Generally, we apply a lambda to rows, as opposed to columns, when we want to perform functionality that needs to access more than one column at a time.
Take for instance, the example function from the exercise:
As we can see, this lambda function is accessing multiple columns of the dataframe: Price and Is taxed?. Because it is accessing multiple columns, it would need to be able to access the entire row, instead of just a single column.
On the other hand, when applying a lambda function to a single column, the lambda will only apply to that columnâs values. For example, from the previous exercise example:
Youâve told Python to calculate the value for every row in the table, and you need to ensure they are added to the dataframe as a new column. axis=1 basically means, âtreat the list of values calculated as a column, not a rowâ
The \ is used to join the following line of code with the preceding line to form a single line. You could also refer to it as a line continuation character. See the docs for more info.
So âŚ
ok, so I think I might have figured it out. I originally thought it was due to the formatting of using row[âhourly_wageâ] instead of using row.hourly_wage, but that is not the case.
when you are defining a function like you are doing, you need to use a line break at the end of each line (fig 1.a), otherwise, you need to write it all on the same âlineâ(fig 1.b);
but you donât need one if you are creating a column and using the lambda function inside the new column (fig 2)
Oh I apologise, I misunderstood your explanation. Yes, for lambda functions, everything needs to be in the same line, otherwise, we would need to use the line break. My bad for that. Thank you for your help on that part!
The strange thing is when I tried your code in fig 2, I still get the same error. However, when I change row[âhourly_wageâ] to row.hourly_wage and same for the row[âhours_workedâ], the code worked. Why is it? Wouldnât both bring the same meaning?
If we use apply without specifying a single column and add the argument axis=1 , the input to our lambda function will be an entire row, not a column. To access particular values of the row, we use the syntax row.column_name or row[âcolumn_nameâ] .
Shouldnât that be: âan entire column, not a rowâ? Since it was describing the case where youâre not using axis=1. Iâm a bit confused.
I got the same issue of âinvalid syntax (â even when I put everything in the same line. However, when I switched logic of the code and the order of the lines from âif row[âhours_workedâ] =< 40â to âif row[âhours_workedâ] > 40â, it worked.
So I am not sure if this is about the algorithm they define what is correct, in which you must use the same pattern that matches theirs, and I actually got it right in my first try. Or thereâs really something I did wrong. Could someone help me out here? Also not sure about if thereâs any rules on using () in lambdaâŚ
Thanks for your reply and I think it has to be written as <= then. I am glad at least itâs not something wrong with the logic or pattern otherwise it will be difficult to do it everytime.