Project: US Census Data. How do I use loop on conversion into numeric

https://www.codecademy.com/courses/practical-data-cleaning/projects/data-cleaning-us-census?action=resume_content_item

On task #14, I was making histogram for each race. I was thinking that can I use loop on these processes? I tried it on removing “%” sign and it’s good. However, I cannot do it on either converting data into numeric type or creating histogram. Is there any way to do so? Here is my code:

for race in us_census:
  race = ['Hispanic', 'White', 'Black', 'Native', 'Asian', 'Pacific']
  us_census[race] = us_census[race].replace('%', '', regex = True)

I tried to add these into the loop and I got error on both.

  us_census.race = pd.to_numeric(us_census.race)
  us_census[race] = pd.to_numeric(us_census[race])

I also tried to use loop on creating histogram and it’s also not working

for race in us_census:
  race = ['Hispanic', 'White', 'Black', 'Native', 'Asian', 'Pacific']
  plt.scatter(us_census[race], us_census['Income'])
  plt.show()

Quite rightly so, for one very simple reason: how would you change any of those text values (e.g. hispanic, or white) into a meaningful numerical value? You can’t, because such an operation is nonsense.

If you read the documentation for the to_numeric() method, you’ll see that there is an optional parameter errors which defaults to raise. This means that when you provide it with data it cannot parse to a number, it raises an exception - this is the behaviour you’re seeing. Changing your code to us_census.race = pd.to_numeric(us_census.race, errors='coerce') will make pandas return NaN for any data which it cannot parse to a number.

What the value of this would be is dubious, though - because the entire list you have in race would evaluate to NaN, so not sure how that’ll improve what you’re trying to do.

I am not hugely familiar with pandas or matplotlib, but I don’t need to be to see that your code is wrong. You’ve got:

for race in us_census:
  race = ['Hispanic', 'White', 'Black', 'Native', 'Asian', 'Pacific']
  plt.scatter(us_census[race], us_census['Income'])
  plt.show()

You’re declaring race as the placeholder variable for your iterator, then immediately assigning it the full list once you’re in the loop… which, again quite rightly, is throwing an error. As to whether you’re doing the plotting correctly? I don’t know, as I don’t know that much about how to use matplotlib.

Lastly, whilst perhaps not so urgent in this case as the issues are reasonably obvious, it generally helps to provide the complete text of any error messages your code is throwing rather than simply stating that you got an error message. The Traceback in Python is pretty helpful at tracking down what’s causing your code to fail. :slight_smile:

I think I am changing numbers with string type in to numeric type. The value in these columns (Hispanic, White, Black…) are percentage value in string type, for example, 6.344322%. That’s why I removed ‘%’ mark at first. The next step of this project is to transfer remaining digits into numeric type so that we can do some calculation. us_census.[‘Hispanic’] = pd.to_numeric(us_census.[‘Hispanic’]) actually works well, when it’s placed outside the loop, and I have to do it with each race, which is 7 times in total. That’s why I was trying to throw it into the loop to see can I do it only once as what I did on removing the ‘%’ mark. However, I got error from it.

The matplotlib part is also based on the same idea. The original code is as below. It creates a scatter chart with Hispanic data as x-value and Income as y-value. Again, I have to do it with each race, which is 7 times in total.

plt.scatter(us_census['Hispanic'], us_census['Income'])
plt.show()