Pandas: How To Clean Data With Python / Stucked!

Hi, how are you guys doing? This is my first time in the community forum.

I got stuck in the ‘How to Clean Data with Python’ in the Cleaning US Census Data project. Step 12.

I get the error: “ValueError: could not convert string to float: ‘Women’”.

I have already converted the values in the ‘Women’ column into floats but I still get this error.

This is my code:

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import codecademylib3_seaborn

import glob

state_files = glob.glob(‘states*.csv’)

df_list =

for i in state_files:

data = pd.read_csv(i)

df_list.append(data)

us_states = pd.concat(df_list)

us_states[‘Income’] = us_states[‘Income’].replace(’[$,]’,’’,regex=True)

us_states.Income = pd.to_numeric(us_states.Income)

us_states[‘Men’] = us_states.GenderPop.str.split(’_’,expand=True)[0]

us_states[‘Women’] = us_states.GenderPop.str.split(’_’,expand=True)[1]

us_states[‘Men’] = us_states.Men.replace(‘M’,’’,regex=True)

us_states[‘Women’] = us_states.Women.replace(‘F’,’’,regex=True)

us_states.Men = pd.to_numeric(us_states.Men)

us_states.Women = pd.to_numeric(us_states.Women)

us_states = us_states.drop([‘GenderPop’],axis=1)

us_states = us_states.fillna(value = {‘Women’:us_states.TotalPop - us_states.Men})

#duplicates = us_states.duplicated()

#print(duplicates.value_counts())

#print(us_states.head())

#print(us_states.dtypes)

#print(us_states.columns)

plt.scatter(‘Women’,‘Income’)

plt.show()

Another question: I did not get duplicates when I used the .duplicated(). The step suggest I should have some…

Thank you very much in advance!

Hi,

Yeah it is in the DS Path. Here the link: https://www.codecademy.com/paths/data-science/tracks/dscp-data-wrangling-and-tidying/modules/dscp-data-cleaning-with-pandas/projects/data-cleaning-us-census

I would be very grateful if you could help me!

Martin