Pandas function on dataframe mysterious error [SOLVED]

I’m working on the ‘This Is Jeopardy’ project in the datascience course and have to design a function to convert a column of string dollar amount values into floats. Some of the strings in this column are like ‘$200’ or ‘$2,000’ and some are ‘None’. This is the function I made:

def Convert(data): if data != 'None': data_new = data.replace('$','') data_newnew = data_new.replace(',','') return float(data_newnew) else: return 0 jeopardy['FltValue'] = jeopardy.loc[jeopardy['Value'].apply(Convert)] print(jeopardy['FltValue'])

I get a long, bizarre error that ends with:
ValueError: cannot reindex on an axis with duplicate labels
I’m confused why this function has anything to do with the index at all. I checked the solution code:

jeopardy_data["Float Value"] = jeopardy_data["Value"].apply(lambda x: float(x[1:].replace(',','')) if x != "None" else 0) print(jeopardy_data["Float Value"])

It seems identical to mine with the only exception being its written as a lambda function. I can just change mine to a lambda function, but I’m at a loss as to why mine mysteriously fails to work, while this code does? What is the difference?
Thank you in advance for any answer.
Edit: Minutes after posting I found the error, the function worked perfectly after I removed .loc , however I’d appreciate any insight as to what exactly happened to result in that error so I’d still appreciate any response!
Edit2: In case anyone is still curious I think I now understand what happened after reading up more on the methods I used.
Basically, I think my original code first ran .apply on, in this case, a column of a dataframe. Applying my converter function, it translated the prices into float values. Next, my code passed this into .loc
.loc thinks its taking an index so it can return specified rows/columns from the dataframe. It did this with the float values passed to it. However, some of these float values will of course be duplicates. So it was receiving duplicate indexes.
I think this is what threw the error. In reality, .loc wasn’t necessary and by simply removing it the code was fixed. This is my understanding currently, I could be wrong in which case please let me know! If this is correct hopefully someone else may find this helpful.

I’m wondering: did you combine two data frames? That error generally happens when there are duplicate values in an index. If you’re combining two DFs, then set the ignore_index = True parameter when combining.

1 Like

Thanks for the reply, I don’t believe so. I’m only using one dataframe called “jeopardy”. I will definitely keep that tip in mind for future problems even if that isn’t the issue here however.

1 Like