Question
In the context of this exercise, is it possible for there to be duplicate values in the id column of a dataframe?
Answer
Yes, unlike SQL which has primary keys, there is no equivalent method for a column in a pandas dataframe. As a result, a column such as order_id
in the orders
table can potentially have duplicate values.
However, there are a few ways you might deal with this.
One way is to drop just the duplicate values of a specific column such that you only have unique values. Like so
df.drop_duplicates(subset=['column_name'])
Another way is to reset the indexes for each row, and possibly use that as a unique identifier, using reset_index()
. This method resets the values of the index column to start from 0 and incrementing by 1 for each row, and each one can be guaranteed to be unique.