FAQ: Data Cleaning with Pandas - Splitting by Index

This community-built FAQ covers the “Splitting by Index” exercise from the lesson “Data Cleaning with Pandas”.

Paths and Courses
This exercise can be found in the following Codecademy content:

Practical Data Cleaning

FAQs on the exercise Splitting by Index

There are currently no frequently asked questions associated with this exercise – that’s where you come in! You can contribute to this section by offering your own questions, answers, or clarifications on this exercise. Ask or answer a question by clicking reply (reply) below.

If you’ve had an “aha” moment about the concepts, formatting, syntax, or anything else with this exercise, consider sharing those insights! Teaching others and answering their questions is one of the best ways to learn and stay sharp.

Join the Discussion. Help a fellow learner on their journey.

Ask or answer a question about this exercise by clicking reply (reply) below!

Agree with a comment or answer? Like (like) to up-vote the contribution!

Need broader help or resources? Head here.

Looking for motivation to keep learning? Join our wider discussions.

Learn more about how to use this guide.

Found a bug? Report it!

Have a question about your account or billing? Reach out to our customer support team!

None of the above? Find out where to ask other questions here!

Hi there,
I was wondering if there is a smarter (quicker) way to erase the column ‘gender_age’ in this exercise, instead of rewriting all the column headers again.
E.g. is there a .drop_columns() ? I tried it but it didn’t work out.

1 Like

I was also thinking about that so I did some searching. It looks like we can use drop function in several ways to reach this. For example:

students = students.drop('gender_age', axis=1)
or
students = students.drop(columns='gender_age')

3 Likes

I also searched and was going to post same thing.

Any time I try to do question 3 I get an “Index object is not callable” even after just copying the answer straight from the solution. Anyone have any idea why this happens? This happens with soooo many lessons

Why do I have to add .str before indexing the first character in “gender_age”?

Isn’t “M” or “F” already a string? Isn’t “M14” a string-object??

1 Like

I have just typed “print(students.columns())” for task 1 within this lesson, and had an error message that reads “‘Index’ object is not callable”. Despite the error message the lesson has marked the first task as complete, and allowed me to progress to the next task in the lesson. I have had this error on several lessons, but don’t understand what it means. Please can someone help by explaining the error? thanks.

.columns doesn’t require parens at the end ‘()’. not ‘.columns()’, just ‘.columns’

not sure, but the hint seems to have an error. I don’t believe (correct me if I’m wrong please :pray:) that a column can be created using ‘students.gender=…’ I think one must use brackets, ‘students[‘gender’]=…’

1 Like

I would like to know this too.

When trying it without .str like:
students[‘gender’] = students[‘gender_age’][0:1]

Returns M14 for the first row and then NaNs. It is as if Python interprets the [0:1] as an index for the column . What is it about str that make Python look at the contents of the row instead?

The issue is that these items are inside a container of some form (a pd.Series or pd.DataFrame). Actually accessing the values requires the right use of syntax. For pandas slicing a series acts a bit like a less robust .iloc as described in the docs: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#slicing-ranges

Using students["gender_age"][0:1] returns a pandas Series object that has been effectively sliced by row (not a single string object and not a Series where every string has been sliced). This behaviour is similar to Python’s normal positional indexing/slicing, e.g. with a list the subscript [3] would give you the fourth element but the subscript [3:4] (slicing) would give you a list object containing the third element.

So your new column "gender" is basically being populated by a Series with a single element so every other element is just filled in as effectively missing. Hopefully that explains the NaNs.

As it’s not uncommon to want to access the underlying data pandas sometimes offers a useful tool to access the contents of a Series (certain types only) using what it calls accessors- https://pandas.pydata.org/pandas-docs/stable/reference/series.html#accessors

A bit like a vectorised getter as you can work with the actual contents of a Series without using standard looping methods. So it’s not Python itself but an option Pandas added in a few special cases, the link would provide much more detail.

1 Like

Nice. I didn’t know about drop.

I did a more complicated way but used it as a method to refresh my knowledge of list comprehensions.

students = students[[ c for c in students.columns if c != 'gender_age']]

Hi All

This may sound very shallow or naive but what is the difference between .str and str()

Thanks in advance