FAQ: Data Cleaning in R - Splitting By Index

This community-built FAQ covers the “Splitting By Index” exercise from the lesson “Data Cleaning in R”.

Paths and Courses
This exercise can be found in the following Codecademy content:

Learn R

FAQs on the exercise Splitting By Index

There are currently no frequently asked questions associated with this exercise – that’s where you come in! You can contribute to this section by offering your own questions, answers, or clarifications on this exercise. Ask or answer a question by clicking reply (reply) below.

If you’ve had an “aha” moment about the concepts, formatting, syntax, or anything else with this exercise, consider sharing those insights! Teaching others and answering their questions is one of the best ways to learn and stay sharp.

Join the Discussion. Help a fellow learner on their journey.

Ask or answer a question about this exercise by clicking reply (reply) below!

Agree with a comment or answer? Like (like) to up-vote the contribution!

Need broader help or resources? Head here.

Looking for motivation to keep learning? Join our wider discussions.

Learn more about how to use this guide.

Found a bug? Report it!

Have a question about your account or billing? Reach out to our customer support team!

None of the above? Find out where to ask other questions here!

Can someone please explain why I put a digit (like 1 or 2) and where I get that digit from at the end of a str_sub function? Thank you


Do you mean put a digit like this?

students <- students %>%
  mutate(gender = str_sub(gender_age,1,1),
         age = str_sub(gender_age,2))

In this case, the “1,1” are the first and last ‘characters’, or the letter M or F. The range is 1 to 1. It’s simply the position in the order of the characters.
For the age, you can use 2,3 or just 2 since the first ‘character’ you are interested in is in the second position and extends to the third position – the two digit age.


why when I copy this exact code into my markdown does it not accept it as the answer? The table even shows the 2 new columns needed, I’ve even tried copying the solution from the answer and it still doesn’t work.

1 Like

I went back to the page and reset the exercise. Even after copy-paste with the solution it’s not working. Sometimes the interface is flaky – I’ve seen this with numerous other exercises.

students <- students %>%
  mutate(gender = str_sub(gender_age,1,1),
         age = str_sub(gender_age,2))

In this code, why only gender variable has 1,1 while age has just 2?
if each one of 1 denotes start and end, shouln’t the same prinple be applied to gender?

Exactly my thoughts, i dont know why age only has (2) when i thought it would be (2,3) to denote the start and end position… ?

@board2294627802 i assume the function has to work within a data frame and it takes its first argument as the target column, its second argument as the first part of the string you want to start splicing, and will continue to take the rest of the string if the optional third argument isn’t supplied, otherwise it stops at third specified argument…
Hope this helps.

You are correct, @davidgarcia719776693. An example is the string M14. The str_sub(gender_age,1,1) takes only the “M”, and the str_sub(gender_age,2) takes everything from the 2 on – that is, the 14.

Hi harrjt

How did you solve the problem? It’s not working either for me…

It seems a bit erratic, so I filed a bug report. Be sure to complete step 3. completely before going to step 4. Even then, it sometimes marks step 3 wrong; and then step 4 after moving on…gremlins!

# print columns of students
# view head of students
# add gender and age columns
students <- students %>% mutate(
  gender = str_sub(gender_age,1,1),
  age = str_sub(gender_age,2))
# drop gender_age column
students <- students %>%

I’ll delete the code once this gets fixed.


Quick heads up… step 1 is asking for column NAMES! Use colnames()

The problem is that most of us assumed they wanted the gender and age split in one step. The editor is marking your response wrong because of this. Instead of performing the entire split, only split the gender.

students ← students %<%
gender = str_sub(gender_age, 1,1))


Hope that helps!