Data Cleaning US Consensus with Pandas


For the task 9, I’m having trouble using the .fillna() function.
They ask to

We can fill in those nan s by using pandas’ .fillna() function. You have the TotalPop per state, and you have the Men per state. As an estimate for the nan values in the Women column, you could use the TotalPop of that state minus the Men for that state

But how do we exactly do that? How do you put the subtraction into the .fillna()?

Here’s what I have so far

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import codecademylib3_seaborn
import glob

us_census = glob.glob("states*.csv")

df_list = []
for state in us_census:
us_census = pd.concat(df_list)

#separate genpop into females and males
# put all ethnic groups into one column "Ethnicity"

us_census['Income'] = us_census['Income'].replace('[\$,]', '', regex=True)
us_census['Income'] = pd.to_numeric(us_census['Income'])

split_gender = us_census['GenderPop'].str.split('_', expand=True)
us_census['Female'] = split_gender[1].str.split('(\d+)', expand=True)[1]
us_census['Male'] = split_gender[0].str.split('(\d+)', expand=True)[1]

#trying to understand how to use the .fillna but this doesn't work. 
values = us_census['TotalPop'] - us_census['Male']
us_census['Female'] = us_census['Female'].fillna(value=values)

us_census['Female'] = pd.to_numeric(us_census['Female'])

us_census['Male'] = pd.to_numeric(us_census['Male'])

# plt.scatter('Female', 'Income')


I wish there was a get through video since I’m not sure if anything here is correct.

Thank you in advance!

project link US Consensus Project


Figured it out (or at least I hope so). If anyone is interested, here’s what I did:

difference = us_census['TotalPop'] - us_census['Male']
us_census['Female'] = us_census['Female'].fillna(value=difference)

actually now that im looking at what i did before it’s the same. so idk. this is confusing

That suggests this isn’t what you changed, but something else.
Running your code results in an error message complaining about types, were those columns of types that support subtraction? One of the steps was to convert to numerical values

The problem was running the .fillna() before converting to numerical values.
If I use it before, the code runs.

I was having the same problem … thanks for posting it in the forum