Create new column from null values with lambda function

One of the projects in the Python Pandas course asks you to create a new column called is_click , which is True if the value in the column ad_click_timestamp is not null and False otherwise. The project can be found at the link below.

One solution is:

ad_clicks['is_click'] = ad_clicks.ad_click_timestamp.notnull()

But I was wondering if there was a way to do this with .apply() and a lambda function. I realize it is not as simple a solution, but to build my coding knowledge I wanted to figure it out.

Here is what I tried:

ad_clicks['is_click'] = ad_clicks.apply(lambda row: True if row.ad_click_timestamp is not null else False, axis=1)

It produces a NameError saying null is not defined.

I also tried this:

ad_clicks['is_click'] = ad_clicks.apply(lambda row: True if row.ad_click_timestamp.notnull() else False, axis=1)

But it produces an AttributeError saying str object has no attribute notnull() which makes sense to me because I am accessing a string value in some cases.

https://www.codecademy.com/paths/data-science/tracks/data-processing-pandas/modules/dspath-agg-pandas/projects/pandas-shoefly-ab-test

1 Like

Hi @ruby6408495122

I think the reason that you’re getting a NameError is because the .notnull() method you’ve previously used is specific to Pandas and Python itself doesn’t have a Null, it’s None - so here Python is thinking that null is an undefined variable.

I think this is a similar issue, where Python just doesn’t understand what you’re asking.

You might be able to do…

ad_clicks['is_click'] = ad_clicks.apply(lambda row: True if pd.notnull(row.ad_click_timestamp) else False, axis=1)

This is a guess based off a quick look through the Pandas docs… so I could very well be completely wrong but I’m curious to know whether it works. :slight_smile:

1 Like

Ah, thank you. My mindset was stuck in SQL.

Your solution worked!

After you pointed out my use of Null instead of None I tried replacing Null with None, but it didn’t work either. Instead it assigned every value in the is_click column to True. Instead of the value being True or False depending on whether ad_click_timestamp is null.

Could it be because the first row results in True? I would have thought the lambda function would apply individually to each row instead of just applying the result from the first row to every row.

1 Like

“Worked” as in the code ran without throwing an exception, or “worked” as in delivered the required result? (Frankly, I’ll be amazed at either but more so if it’s the latter…)

I wonder whether, if you’re using ad_clicks.apply(lambda row: True if row.ad_click_timestamp is not None else False, axis=1) it’s assigning every value in the column as True because Python is seeing null and interpreting that as a value?

My understanding from the documentation is that .apply() applies the defined function to each row/column individually depending on which axis you chose. I think it’s more likely that Pandas is interpreting the Python None as equivalent to null but Python isn’t doing the inverse and treating the Pandas null as equivalent to None.

(This is speculation, of course - I haven’t actually ever used Pandas!)

1 Like

Your solution worked as in it delivered the desired result!

I think you’re right about Python interpreting null as a value. The Pandas Dataframe contains the nan for null values. So your guess may be especially true.

Thanks for all the suggestions and offering your help despite not having used Pandas.

1 Like

Well I’m rather surprised at that, but I’m glad we sorted it. :grin: