FAQ: Data Cleaning with Pandas - Splitting by Character

This community-built FAQ covers the “Splitting by Character” exercise from the lesson “Data Cleaning with Pandas”.

Paths and Courses
This exercise can be found in the following Codecademy content:

Practical Data Cleaning

FAQs on the exercise Splitting by Character

There are currently no frequently asked questions associated with this exercise – that’s where you come in! You can contribute to this section by offering your own questions, answers, or clarifications on this exercise. Ask or answer a question by clicking reply (reply) below.

If you’ve had an “aha” moment about the concepts, formatting, syntax, or anything else with this exercise, consider sharing those insights! Teaching others and answering their questions is one of the best ways to learn and stay sharp.

Join the Discussion. Help a fellow learner on their journey.

Ask or answer a question about this exercise by clicking reply (reply) below!

Agree with a comment or answer? Like (like) to up-vote the contribution!

Need broader help or resources? Head here.

Looking for motivation to keep learning? Join our wider discussions.

Learn more about how to use this guide.

Found a bug? Report it!

Have a question about your account or billing? Reach out to our customer support team!

None of the above? Find out where to ask other questions here!

I have a doubt:
What is the preferred method for managing the splits

df[‘str_split’] = df.type.str.split(’_’)


str_split = df.type.str.split(’_’)

because in the introduction to splits they use the first method but the exercise can only be validated with the second.


Yes and also following line has to be
df[‘usertype’] = str_split.str.get(0)
df[‘country’] = str_split.str.get(1)

I think later option is more valid since it doesn’t make additional(useless) column.

1 Like

What if a Student has 2 words in his last name separated by space ?


I’m finding a lot of this throughout the course. There’s a given example of how to do what they are asking you to do - except if you use their example to complete the lesson it’s invalid.

Aside from reading the error messages I would not guess “DO NOT USE WHAT WE ARE TEACHING YOU” to complete the lesson.
I get that it’s important to use your coding knowledge to figure out alternatives/better ways to code but when you’re learning something - especially on a platform where you have to enter semi specific code to progress - why would you not use the methods you are taught?


THIS!!! Came here to complain about the same thing. Two lessons in a row that teach you one thing and then just don’t accept the answer at all when you try it out. So frustrating


I wondered if we can skip the step 1 with the following code

students['first_name'] = students['full_name'].str_split.str.get(0)

students['last_name'] = students['full_name'].str_split.str.get(1)

It seems that doesn’t work though. I am not sure why… Someone knows why it doesn’t work?

After the data is split, why does the get(0) on the string return the last names and get(1) on the string return the first names?

name_split = students.full_name.str.split(" ")

students[‘first_name’] = name_split.str.get(0)

students[‘last_name’] = name_split.str.get(1)

Did you define “str_split”?

I believe this could be done with the following:

students[‘first_name’] = students.full_name.str.split(’ ‘).str.get(0)
students[‘last_name’] = students.full_name.str.split(’ ').str.get(1)

Question 1 asks us to make a “series object.” That was the missing link for me in understanding this exercise. The lesson taught us how to make an intermediate column to get the columns we need, but the question wants us to use a series object to do the same thing.

While question 1 is somewhat confusing, I think it is fair for them to ask this, as we have learned about series objects in this course.

Also, thanks for posting on this thread, your comments helped me a lot!

You are not telling it where to split the full_name (at the space) anywhere.

it does not work because there isn´t such a method as str_split. if you use it in your code you should get an attribute error that says that there isn´t such an atrribute for series.
actually str is a string-accessor. From the doc´s: " Series.str can be used to access the values of the series as strings and apply several methods to it. These can be accessed like Series.str.<function/property> ."
And split is another thing. It is the method that actually splits the string.
In short: you got a typo error, change the underscore to a dot.
Hope that helps

Hey guys is anyone else finding the dataframe display (or jupyter notebook display) unresponsive at times. Like when you edit the code and it doesn’t update even though it looks like it refreshed.

The solution suggested by the Codecademy has one serious flaw due to the fact that some students have last names containing spaces.
When splitting by a whitespace and assigning the second part of the name_split list to the last_name column, output would look like this (I sorted the dataset by the last_name column).