Using lamba to check 'nan' values A/B Testing Project

Link to exercise: https://www.codecademy.com/paths/bi-data-analyst/tracks/dsf-pandas-for-data-science/modules/dsf-aggregates-in-pandas/projects/pandas-shoefly-ab-test

I was trying different ways to use lambda, to identify which columns have values and which do not, as I didn’t understand how the hint worked. How can I resolve these errors? (For #1 I read that it is b/c it is showing a series of true/false values, but I thought the program I used was evaluating one row at a time)

  1. clicks = lambda row: False if ad_clicks.ad_click_timestamp.isnull() else True
    ad_clicks[“is_click”] = ad_clicks.apply(clicks, axis=1)
    ValueError: (‘The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().’, ‘occurred at index 0’)

  2. clicks = lambda row: ~np.isnan(row.ad_click_timestamp)
    ad_clicks[“is_click”] = ad_clicks.apply(clicks, axis=1)
    TypeError: (“ufunc ‘isnan’ not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ‘‘safe’’”, ‘occurred at index 0’)

Is this for question 3? Where you have to create a new column based on whether or not some conditions are met with rows? If someone clicked on the ad, then the timestamp won’t be NULL.

Note: NULL (empty) is not the same as NaN (not a number).

np.isnan() doesn’t work on objects, it works on floating point numbers. (Look up the TypeError)

So, it’s better to use pd.isnull() here.

Summary
ad_clicks['is_click'] = ~ad_clicks.ad_click_timestamp.isnull()
]

Yes for question 3! I tried using isnull() per my first attempt but got an error saying it’s seeing a series of values. Am I not specifying to just check the row? Or is it running through all the rows first?

A couple things.

  • you’re creating a new column in a df, so that needs to have the correct syntax:
    df['new_col_name'] = then your lambda can go here

  • the lambda is a bit off. did you research the ValueError?
    is this: ad_clicks[“is_click”] = ad_clicks.apply(clicks, axis=1) part of the lambda function? (sorry, I’m confused). What is “clicks”? is it a string or something else? Either True or False is supposed to be the result in each row after the conditions are/aren’t met.

(I think) this is why it’s better to use the tilde, “~” which negates the statement, ~ad_clicks.ad_click_timestamp.isnull() ] to test whether or not that value is NULL and returns either True or False. it’s more concise.

I was putting the lambda function in the variable “clicks” which I guess wasn’t corrected so I removed it. I did research the ValueError as mentioned in my original post about the series of values, so the issue was probably stem with me separating the lambda function.

The below finally worked without the tilde (I put the tilde in front of the ‘pd’ but the columns were giving me ‘-1’ and ‘-2’ instead of ‘True’ and ‘False’:
ad_clicks[“is_click”] = ad_clicks.apply(lambda row: False if pd.isnull(row.ad_click_timestamp) else True, axis=1)

It’s b/c you can’t use the tilde with that lambda function b/c it only works on one operand.

More on bitwise complement operator/unary operator:

https://wiki.python.org/moin/BitwiseOperators

and,

1 Like