What is the difference between using these two ways?

I am done with “Modifying DataFrames” of the “Data Analysis with Pandas” and have learned a lot and had fun using them. But in the end I am still confused about something. Please take a lot at these two snippets of code. What would the differences be between these two ways:

import pandas as pd

df = pd.DataFrame({
    "Title": ["Avengers: End Game", "How to Train Your Dragon 3", "Harry Potter 2", "Iron Man 3", "Baywatch"],
    "My Rating": [5.0, 4.8, 4.9, 4.5, 4.23]
    })

personal_favorite_shelf = lambda x: "Yes" if x > 4.50 else "Meh"

df["Is my personal favorite?"] = df['My Rating'].apply(personal_favorite_shelf)
print(df)

as opposed to using:

import pandas as pd

df = pd.DataFrame({
    "Title": ["Avengers: End Game", "How to Train Your Dragon 3", "Harry Potter", "Iron Man 3", "Baywatch"],
    "My Rating": [5.0, 4.8, 4.9, 4.5, 4.23]
    })

personal_favorite_shelf = lambda x: "Yes" if x["My Rating"] > 4.50 else "Meh"

df["Is my personal favorite?"] = df.apply(personal_favorite_shelf, axis=1)
print(df)

Both of these ways get the job done neatly (although I don’t know much if one is more efficient than the other). Can anyone explain the differences here and when one would be preferred over the other? Thanks. Here’s a link of the lesson by the way, https://www.codecademy.com/courses/data-processing-pandas/lessons/pandas-ii/exercises/review-ii?action=resume_content_item

And please can you tell a little about the “axis” argument and how it works?

Have you read the documentation for that method? Because if you haven’t, then I don’t think you should expect to be able to meaningfully leverage it. Same goes for any function you ever use.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html

The rest of the code you’ve shown is changed in your two versions as if you already know what the difference is, they’re adapted to the different shapes of data that they act on. But if that’s something supplied by the exercise then, yeah, the documentation says what that parameter does, if the exercise doesn’t already.

Aside from what it at all does, whether it translates to similar actions under the hood, that comes down to understanding pandas’ data model. I know nothing of that. For small datasets the difference would be drowned out by things like importing pandas at startup. If your function only needs to look at a single cell then perhaps it’s better to not give it whole rows and instead apply it on each cell in that column.

Thank you so much for your reply. I now get what the difference is and when to use one. And about documentations, I am quite new to Python and programming and haven’t grown the habit of referring to the documentation every now and then which is absolutely necessary. From now on I will read documentations of whatever I am working with. Thanks :slight_smile: