Finalised program -Chapter 17 (Data Cleaning with Pandas) -US Census Data

would someone have the final code for the ‘US Census Data’ project in Module 17 in Data Science. I have pasted what I could do so far…gets wrong with lines with ‘duplicate()’…

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import codecademylib3_seaborn
import glob

files =glob.glob(“states*.csv”)
df_list =

for filename in files:
data =pd.read_csv(filename)

df =pd.concat(df_list)

df[‘Income’] =df[‘Income’].replace(’[$,]’,’’, regex =True)
pd.Income =pd.to_numeric(df.Income)

df[‘women’]=df.str_split.str.get(1)’[“M”,]’,’’,regex =True)
df.women=df.women.replace(’[“F”,]’,’’,regex =True) =pd.to_numeric(
df.women =pd.to_numeric(df.women)



df =df.duplicated()
df =df.drop_duplicates()


hey! got the code here

1 Like

many thanks…sergialess… : )

what editor did you use to write the program.

hi thanks Sergialess, the solution code u found is good but doesnt work in codecademy…
Codecademy hasnt even given appropriates download files… this just keeps happening
. Amazing you think we are actually trying to learn something and they just throw crap at us without explaining or any video turorials. Seems the deeper you go into Data Science the lazier this site is getting. It’s quite shocking that anyone isnt even replying to your posts or others aswell , think ill give up after 6 hours lol.
Thanks again