Finalised program -Chapter 17 (Data Cleaning with Pandas) -US Census Data

would someone have the final code for the ‘US Census Data’ project in Module 17 in Data Science. I have pasted what I could do so far…gets wrong with lines with ‘duplicate()’…

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import codecademylib3_seaborn
import glob

files =glob.glob(“states*.csv”)
df_list =

for filename in files:
data =pd.read_csv(filename)

df =pd.concat(df_list)

df[‘Income’] =df[‘Income’].replace(’[$,]’,’’, regex =True)
pd.Income =pd.to_numeric(df.Income)

df[‘women’]=df.str_split.str.get(1)’[“M”,]’,’’,regex =True)
df.women=df.women.replace(’[“F”,]’,’’,regex =True) =pd.to_numeric(
df.women =pd.to_numeric(df.women)



df =df.duplicated()
df =df.drop_duplicates()


hey! got the code here


many thanks…sergialess… : )

what editor did you use to write the program.

hi thanks Sergialess, the solution code u found is good but doesnt work in codecademy…
Codecademy hasnt even given appropriates download files… this just keeps happening
. Amazing you think we are actually trying to learn something and they just throw crap at us without explaining or any video turorials. Seems the deeper you go into Data Science the lazier this site is getting. It’s quite shocking that anyone isnt even replying to your posts or others aswell , think ill give up after 6 hours lol.
Thanks again

Jupyter notebook was used. You can use terminal, command line to install the basic version that is going to let you work inside you browser… the other solution is to install Anaconda ( and you can load various tools.

hope this helps

Many thanks this complete version. This exercise missed the valuable tutorial that comes with… and as an analyst, data cleaning is one of the major function with need to master <3

Thank god Sergialess wrote this code. I had no clue even how to load the proper file.
And the code does mostly work in codeacademy. I don’t know why they did not do a tutorial but I get only a few errors. My final graphs are histographs amd not bar graphs. I wonder why