In the context of this exercise, is it possible for there to be duplicate values in the id column of a dataframe?
Yes, unlike SQL which has primary keys, there is no equivalent method for a column in a pandas dataframe. As a result, a column such as
order_id in the
orders table can potentially have duplicate values.
However, there are a few ways you might deal with this.
One way is to drop just the duplicate values of a specific column such that you only have unique values. Like so
Another way is to reset the indexes for each row, and possibly use that as a unique identifier, using
reset_index(). This method resets the values of the index column to start from 0 and incrementing by 1 for each row, and each one can be guaranteed to be unique.