Cleaning US Census Data - Data visualisation question

Hi,

I am working on the “Cleaning US Census Data” project on the DataScience career path (17 - Practical Data Cleaning).

Can someone tell me why no histogram displays on the browser, and I get scatter graphs, instead ?
I get no error message or anything, there’s no video so don’t know what to do…

Here the code:

import pandas as pd
import numpy as np
import matplotlib.pyplot as pyplot
import codecademylib3_seaborn
from matplotlib import pyplot as plt


df1 = pd.read_csv("states0.csv")
df2 = pd.read_csv("states1.csv")
df3 = pd.read_csv("states2.csv")

import glob

files = glob.glob("states*.csv")
df_list = []
for filename in files:
  data = pd.read_csv(filename)
  df_list.append(data)
us_census = pd.concat(df_list)

name_split = us_census.GenderPop.str.split('_')
us_census['Men'] = name_split.str.get(0)
us_census['Women'] = name_split.str.get(1)

us_census.Income = us_census['Income'].replace('[\$]', '', regex=True)
us_census.Men = us_census['Men'].replace('[M]', '', regex=True)
us_census.Women = us_census['Women'].replace('[F]', '', regex=True)

us_census.Men = pd.to_numeric(us_census.Men)
us_census.Women = pd.to_numeric(us_census.Women)

plt.scatter(us_census.Men, us_census.Women)
plt.show()

us_census['Women'] = us_census['Women'].fillna(us_census['TotalPop']-us_census['Men'])

us_census = us_census.drop_duplicates()

plt.scatter(us_census.Men, us_census.Women)

us_census.TotalPop = pd.to_numeric(us_census.TotalPop)

totalpop = us_census['TotalPop']

plt.hist(totalpop, range=(0,1000), bins=40)

plt.show()

Thanks !

I’m afraid I don’t have access to this data and I’ve not tried matplotlib on CA so I can’t test it but I have a couple of queries. I assume there’s no error because it ran but just didn’t do what you expected.

Is this inteded to create multiple figures or are you trying to add plots to the original axis each time? Many pyplot plotting functions are designed to work with the current figure/axis and create a new one only if there isn’t one.

If it’s supposed to be a single figure, do your x/y axis values actually match up (histogram and scatter)? Combining different plot styles on the same axis often ends with over or undersized plots, just double check they actually overlap or you might plot things but never find them.

1 Like

@tgrtim is absolutely right, and this can be a little confusing when you’re learning Matplotlib and Seaborn.

The quickest workaround if you don’t want to create a separate figure for each is to just throw a plt.close() in there after each plot.