.duplicated() and .drop_duplicates() won't work

Hi all,

Working on the assignment “Cleaning US Census Data” and I have to find and remove duplicates. However, my code (below) does not work and raises a TypeError:

us_census.duplicated()
TypeError: unhashable type: 'list'

What do I have to do to fix this error?

Thanks :slight_smile:

Hi @yinchurijnaard
What lesson is this? (link to it please)
Can you post your formatted code so we can see/check for errors?

Also, that error sounds similar to this issue here:
https://stackoverflow.com/questions/13675296/python-typeerror-unhashable-type-list

Specifically, this part:
" Hash values are just integers which are used to compare dictionary keys during a dictionary lookup quickly.

Internally, hash() method calls __hash__() method of an object which are set by default for any object."

I’m sorry, forgot about that. The entire code is as follows:

# assignment 10
us_census.duplicated()

# assignment 11
us_census = us_census.drop_duplicates()

Link to the exercise: https://www.codecademy.com/paths/data-science/tracks/practical-data-cleaning/modules/data-cleaning-with-pandas/projects/data-cleaning-us-census

Hi, I meant all of your code.

I’m thinking that you might have a list inside a dictionary(?) that you’re trying to access? Are you trying to remove duplicates under a particular column only?

I think you’re supposed to put the df name in the parens in your “assignment 11” part.
pd.DataFrame.drop_duplicates(df)

https://datatofish.com/remove-duplicates-pandas-dataframe/

I also think the issue might be that you’re supposed to store the results in a new df like this:
new_df = old_df.drop_duplicates()

Hi there,

I had the same problem. The issue was solved after I deleted the column that I created under exercise 6 (where you had to use str.split("_")) as the column that I created for this exercise contained lists. Perhaps this is the same problem you are facing.

Good luck :slight_smile:

3 Likes

You are getting the unhasable error as you are using the list as a dictionary key or converting nested list into set. The fix for the TypeError: unhashable type: ‘list’ is to convert( typecasting) the list into the tuple.
Just read the tutorial I have found for you on `TypeError: unhashable type: ‘list’

I couldn’t figure out how to make the .drop_duplicates() work BUT I found that if you add a line in the for loop that creates the concatenated dataframe to drop the last line of each csv (that’s where the duplicates come in), it will work just fine. It’s not a realistic solution for every-day application (especially if there’s only duplicates on a few files) but it works for this project.

df_list = [] for filename in files: data = pd.read_csv(filename) data = data.drop(data.index[-1]) df_list.append(data)