Cleaning US Census Data - Potential Bug? Cannot convert float? (SOLVED)

Hey all,

Not sure what I am doing wrong. In step 8 of this project for Cleaning Data with Python:

I keep getting an error when making the scatter plot. Says at the very end after attempting to view the plot:

ValueError: could not convert string to float: ‘Female’

Here is my code thus far, more or less straightforward:

import pandas as pd import numpy as np import matplotlib.pyplot as plt import codecademylib3_seaborn import glob #step 1 --> setup of one df all_csv_files = glob.glob('states*.csv') us_census_list = [] for filename in all_csv_files: data = pd.read_csv(filename) us_census_list.append(data) us_census = pd.concat(us_census_list) #step 2 --> viewing/inspecting for later #print(us_census.columns) print(us_census.dtypes) #step 3 --> more viewing...etcetc #print(us_census.head()) #step 4 --> convert Income column to float us_census.Income = us_census['Income'].replace('[\$]','',regex=True) us_census.Income = pd.to_numeric(us_census.Income) #print(us_census.head()) #step 5 --> split Genderpop column us_census_GP_split = us_census.GenderPop.str.split('_') us_census['Male'] = us_census_GP_split.str.get(0) us_census['Female'] = us_census_GP_split.str.get(1) #print(us_census.head()) #step 6 --> convert into numerical, remove M/F letters us_census.Male = us_census['Male'].replace('[M]','', regex=True) us_census.Female = us_census['Female'].replace('[F]','', regex=True) us_census['Male'] = pd.to_numeric(us_census.Male) us_census['Female'] = pd.to_numeric(us_census.Female) #print(us_census.head()) #Step 6.5 --> why tf the column still tghere lol us_census = us_census.drop('GenderPop', 1) print(us_census.head()) #step 7 --> make scatterplot plt.scatter('Female','Income')

As far as I can tell, the data type for column “Females” is correct. Is this a bug? Maybe someone else has run into this. Making a scatterplot is not difficult…

Thank you :slight_smile:

ETA: Not sure why the codebyte is spitting out an error for concatenating issues. I don’t see that when running it in the assignment itself.

SECOND ETA: So it was obvious! forgot to call us_census before the columns (us_census.Female, us_census.Income). If practicing python has taught me anything about myself, I have a horrific habit of answering my own questions the second I ask them. I will, however, be leaving this up as a warning for anyone else doing the same thing. Haha