FAQ: Modifying DataFrames - Review

Yes, this is possibly the worst written module in the Data Viz path so far. I always feel like these kinds of complaints are treated with skepticism on these forums, with a suspicion that it’s the learners fault. But earlier modules that had more complex concepts (nested loops, etc) were taught simply and clearly. But here, I feel like the substance of the concepts that were covered were pretty simple (i.e., renaming columns, changing values with conditionals. All things that in SQL were presented in a straightforward manner). But it was just presented in a confusing manner. It rushes right through syntax, and introduces syntactic necessities without properly breaking them down.

There was an earlier module where I felt similarly (also where the concepts should have been simple) and it makes me think there are some at Codecademy that are better than others at structuring lessons.

5 Likes

Hi everyone,
I am new to this community :slight_smile:
Can someone please explain to me when to use plt.hist() and np.histogram() ?

They seem to do the same thing to me.

I think it’s unrelated to this topic ( “FAQ: Modifying DataFrames - Review”), but I’m interested in np.histogram() and did some search. It seems plt.hist() computes and draws a histogram, while np.histogram() computes a histogram but doesn’t draw. So it seems that we use plt.hist() if we need to draw, and np.histogram() if we just want data without drawing.

1 Like

This seems to return memory addresses for the column I generate. Why?

import codecademylib

import pandas as pd

orders = pd.read_csv(‘shoefly.csv’)

print(orders.head())

orders[‘shoe_source’] = lambda x : ‘animal’ if x.shoe_material == ‘leather’ else ‘vegan’

print(orders)

I think it’s because the function itself (not the returned value of the function) is set in the shoe_source column.

1 Like

So it seems lambda typically requires a parameter ex:

test = lambda x: x + 1

test(1)

2

But with pandas, the parameter seems to be implied?

ex:
df[‘new_column’] = df.existing_column.apply(lambda x: True if x == True else False)

Where is lambda getting ‘x’?

ex:
genderLambda = lambda row: f’Dear Mr. {row.last_name}’ if row.gender == ‘male’ else f’Dear Ms. {row.last_name}’

orders[‘salutation’] = orders.apply(genderLambda, axis=1)

Where is lambda getting ‘row’?

Some of the things within this course was presented in the Python 3 course, i completed that and then started this course which helped me a ton

Hi guys,

Can anyone explain to me why this code does not work?

orders['salutation'] = orders.gender.apply(lambda x: 'Dear Mr. {}'.format(orders.last_name) if x == 'Male' else 'Dear Ms. {}'.format(orders.last_name))

The solution code declares the lambda function as a seperate function, and presumably this is applied to the data frame object.

mylambda = lambda row: \
  'Dear Mr. {}'.format(row.last_name) \
  if row.gender == 'male' \
  else 'Dear Ms. {}'.format(row.last_name)

Im not sure why my solution would not work though

Hi Guys,

Im struggling to get my head around the importance of the axis=1 argument in this bit of code:

orders['salutation'] = orders.apply(lambda row: \
                                    'Dear Mr. ' + row['last_name']
                                    if row['gender'] == 'male'
                                    else 'Dear Ms. ' + row['last_name'],
                                    axis=1
                                    )

What is axis=1 actually saying here? And what would occur if the axis was set to default 0?

The code editor for this lesson seems a little buggy so I’m not able to experiment with this properly.

1 Like

It took me a while to get my head around this as well. I think its mainly because of how a lambda is structured. In a longform function it makes more sense.

So in your example you are passing the argument as the orders dataframe itself, which is why you can access individual columns within the lambda itself.

The syntax of calling the function as an argument within the apply method makes this a little more confusing.

Naturally I would expect this:

orders[‘salutation’] = orders.apply(genderLambda(orders), axis=1)


I have a feeling this type of syntax will become clearer when I delve more into python. Coming from javascript, I was really confused by this as well.

axis=0 would work going through the dataframe column by column whereas axis =1 allows the input data to be a row

Hey experts,

Just curious why I can create a new column with:
mydf[‘new_column’] = something
but not:
mydf.new_column = something

Thanks for any insight. Maybe the answer is “that’s just the way python syntax works”?

Pandas columns have their standard access via the subscript method __getitem__ which is the object[column_name] syntax, much like dictionaries. In addition to this they also offer a convenience method to access (but not assign) columns using the dotted lookup (in a limited form).

Under normal Python the dot syntax is to bind/access attributes on objects, pandas just overrides the __getattr__ method to allow dotted access as an alternative to the subscript, sometimes.

The dotted syntax can sometimes be preferable for those who look the method chaining style of writing, e.g. df.colname.groupby('word').index.loc.mean() ... ...
It is more restricted though and the obj[column_name] syntax is the more robust tool, see the docs for more info- https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#attribute-access

2 Likes