Cleaning data using pandas

So I am trying to create a fast way to change a few things for different columns in my dataframe us_census.

This dataframe has percentage population of different races in each state. However they have a percent symbol at the end of the value and are not int types.

I am trying to create a loop of some kind that can go through each of these columns and first delete the percent sign and then turn it into a int data type.
The below code is what I have now that works.

us_census.Hispanic = us_census.Hispanic.replace('%','',regex=True)

us_census.Hispanic = pd.to_numeric(us_census.Hispanic)

This changes one column. Instead of doing this for each of the other 5 columns I am wondering if there is a faster way with a loop

This is part of this exercise

Thanks :grinning:


You’re on the right track. Here’s a slightly different way to begin:

# Item 14
race_list = ['Hispanic', 'White', 'Black', 'Native', 'Asian', 'Pacific']
for race in race_list:
  us_census[race] = us_census[race].replace('[%,]', '', regex=True)
1 Like

Thanks so much. This makes so much sense