I am getting an error message on Task 10 of this project when I try to check for duplicates. Really not sure what is going on because everything has worked up until this point and this seems like a pretty straightforward task. Code is below. What am I missing here?
import pandas as pd
import numpy as np
import matplotlib.pyplot as pyplot
import codecademylib3_seaborn
import glob
files = glob.glob('states*.csv')
df_list = []
for filename in files:
data = pd.read_csv(filename)
df_list.append(data)
us_census = pd.concat(df_list)
print(us_census.columns)
print(us_census.dtypes)
print(us_census.head())
us_census.Income = us_census['Income'].replace('[\$,]', '', regex=True)
us_census['str_split'] = us_census.GenderPop.str.split('_')
us_census['men'] = us_census.str_split.str.get(0)
us_census['women'] = us_census.str_split.str.get(1)
us_census.men = us_census['men'].replace('[M,]', '', regex=True)
us_census.women = us_census['women'].replace('[F,]', '', regex=True)
us_census.Income = pd.to_numeric(us_census.Income)
us_census.men = pd.to_numeric(us_census.men)
us_census.women = pd.to_numeric(us_census.women)
pyplot.scatter(us_census.women, us_census.Income)
pyplot.show()
print(us_census.women)
us_census = us_census.fillna(value={'women': (us_census.TotalPop - us_census.men)})
print(us_census.women)
duplicates = us_census.duplicated()
This is the error I’m getting:
Traceback (most recent call last):
File "script.py", line 49, in <module>
duplicates = us_census.duplicated()
File "/usr/local/lib/python3.6/dist-packages/pandas/core/frame.py", line 4954, in duplicated
labels, shape = map(list, zip(*map(f, vals)))
File "/usr/local/lib/python3.6/dist-packages/pandas/core/frame.py", line 4932, in f
vals, size_hint=min(len(self), _SIZE_HINT_LIMIT)
File "/usr/local/lib/python3.6/dist-packages/pandas/util/_decorators.py", line 208, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/pandas/core/algorithms.py", line 672, in factorize
values, na_sentinel=na_sentinel, size_hint=size_hint, na_value=na_value
File "/usr/local/lib/python3.6/dist-packages/pandas/core/algorithms.py", line 508, in _factorize_array
values, na_sentinel=na_sentinel, na_value=na_value
File "pandas/_libs/hashtable_class_helper.pxi", line 1798, in pandas._libs.hashtable.PyObjectHashTable.factorize
File "pandas/_libs/hashtable_class_helper.pxi", line 1718, in pandas._libs.hashtable.PyObjectHashTable._unique
TypeError: unhashable type: 'list'
This is just a guess: Somewhere along the line, you are trying to identify duplicate lists. As lists are not hashable (i.e., they are mutable, and can’t be used as dictionary keys or members of sets), they can’t be meaningfully compared as duplicate or not.
Thanks. Looks like the “str_split” column that was added when I split male and female population values was formatted as a list. Once I dropped that column from the data frame, I stopped getting the error. Really appreciate your help!
1 Like