FAQ: Data Cleaning with Pandas - More String Parsing

I found this lesson pretty confusing. I would have appreciated it if they first explained the syntax of the code after the split method. :

split_df = df['exerciseDescription'].str.split('(\d+)', expand=True)

It was also unclear what expand does. I get that they expect you to read other material or resources, but I would have appreciated it if they had at least referred us to that part of the documentation to get a better understanding of the syntax.

Same. I think this might have been a mistake.

Can someone explain these two codes and what is the difference? Thank you

Code 1:
split_df = df[‘exerciseDescription’].str.split(’(\d+)’, expand=True) <<<----- How is this code separating the string? Is it separating the string based on 1 or more digits?

Code 2:
students.grade = students.grade.str.split(’(\d+)’, expand=True) [1] <<<<------ Why is there a [1] in the end how does it separate the string differently from the first code?

Hello! Just wanted to share my solution to eliminate all non-digits: students[‘grade’] = students[‘grade’].replace(’[\D+]’, ‘’, regex=True) , since we don’t want to use anything else from that data (string). :wink:

Hi guys, I was doing the exercise of the More String Parsing lecture. The picture below is my code and the table it returns.

Here are my questions:

  1. Normally, str.split(n) will make n disappeared from the string. Was it because of the grouping of regex that make n retrievable after being split?

  2. I tried to do it differently with the given solution, so I changed my code to expand= False and found out it didn’t return a table with split strings in different columns, instead, it returned a list of split strings. I had no idea how would my code below work, because I used pandas.Series.str which vectorizes string functions for Series but it was actually a list in the column.

students.grade = pd.to_numeric(split_grade.str[1])

Please help me out on this. Thanks!