Is it possible for there to be duplicate values in the id column of a dataframe?


#1

Question

In the context of this exercise, is it possible for there to be duplicate values in the id column of a dataframe?

Answer

Yes, unlike SQL which has primary keys, there is no equivalent method for a column in a pandas dataframe. As a result, a column such as order_id in the orders table can potentially have duplicate values.

However, there are a few ways you might deal with this.

One way is to drop just the duplicate values of a specific column such that you only have unique values. Like so
df.drop_duplicates(subset=['column_name'])

Another way is to reset the indexes for each row, and possibly use that as a unique identifier, using reset_index(). This method resets the values of the index column to start from 0 and incrementing by 1 for each row, and each one can be guaranteed to be unique.