If two dataframes share more than one column name, how are they merged?


#1

Question

In the context of this exercise, if two dataframes share more than one column name, how are they merged? Are they just merged on the first matching column, or every matching column?

Answer

The merge will check all columns that match between the two dataframes if they share more than one column name.

By default, if we run the pd.merge() method, it performs an inner join. With an inner join, all values of every matching column must match in order for the rows to be returned.

In the following example, only the rows for which all values of every matching column are the same will be returned.

Example

df1 = pd.DataFrame({
  'id': [1, 2, 3],
  'name': ['Alice', 'Bob', 'Carl']
})

df2 = pd.DataFrame({
  'id': [1, 2, 3],
  'name': ['David', 'Elsa', 'Carl']
})

merged = pd.merge(df1, df2)
print(merged)
#       id    name
#  0     3    Carl