Cleaning US Census Data- What methods do we need to use to replace a column in df?

My questions are about this project
https://www.codecademy.com/paths/data-analyst-training-for-your-business-cfb/tracks/dacp-data-wrangling-and-tidying/modules/dscp-data-cleaning-with-pandas/projects/data-cleaning-us-census

When we want to replace a column in a dataframe we can do it two ways. For instance if we want to remove the “$” from the income column we can do any of these two options
1. us_census.Income = us_census.Income.str.replace("[$]", '', regex=True)

Income = []
2. for i in range(0, len(us_census.Income)):
  string = str(us_census.Income.iat[i])
  replace = string.replace('$', "")
  Income.append(replace)

df['new_column']= Income

What is the difference between these two approaches and why there are some cases which we can’t use the first approach?
For instance if we want to break the column GenderPop and create two new columns for men and women we can’t use the first approach.

I think you meant that you’re not replacing a column, you’re cleaning up unwanted characters from a column.
You’d use the first approach and not the second. Why create more work for yourself? :slight_smile: Regex is easier in this case, right?

See:

us_census.Income = us_census['Income'].replace('[\$,]', '', regex=True)

us_census.Income = pd.to_numeric(us_census.Income)

You wouldn’t use str.replace() to split a column. You’d use .str.split() and something like:

Summary
us_census['Men'] = us_census['GenderPop'].str.split('_').str[0]
us_census['Women'] = us_census['GenderPop'].str.split('_').str[1]
1 Like