I’m working on the ‘This Is Jeopardy’ project in the datascience course and have to design a function to convert a column of string dollar amount values into floats. Some of the strings in this column are like ‘$200’ or ‘$2,000’ and some are ‘None’. This is the function I made:
I get a long, bizarre error that ends with:
ValueError: cannot reindex on an axis with duplicate labels
I’m confused why this function has anything to do with the index at all. I checked the solution code:
It seems identical to mine with the only exception being its written as a lambda function. I can just change mine to a lambda function, but I’m at a loss as to why mine mysteriously fails to work, while this code does? What is the difference?
Thank you in advance for any answer.
Edit: Minutes after posting I found the error, the function worked perfectly after I removed .loc , however I’d appreciate any insight as to what exactly happened to result in that error so I’d still appreciate any response!
Edit2: In case anyone is still curious I think I now understand what happened after reading up more on the methods I used.
Basically, I think my original code first ran .apply on, in this case, a column of a dataframe. Applying my converter function, it translated the prices into float values. Next, my code passed this into .loc
.loc thinks its taking an index so it can return specified rows/columns from the dataframe. It did this with the float values passed to it. However, some of these float values will of course be duplicates. So it was receiving duplicate indexes.
I think this is what threw the error. In reality, .loc wasn’t necessary and by simply removing it the code was fixed. This is my understanding currently, I could be wrong in which case please let me know! If this is correct hopefully someone else may find this helpful.