Can we merge on more than one specific column?

Question

In the context of this exercise, can we merge on more than one specific column?

Answer

Yes, you can perform a merge on one column, or on multiple specified columns, by passing in a list of the column names for each dataframe.

When listing multiple column names, it will only return rows for which all the column values match. Furthermore, the number of columns listed must match, and the order they are listed will matter.

Example

# This will match the values for 
# column "a" with "c" 
# and column "b" with "d".

pd.merge(
  df1,
  df2,
  left_on=["a", "b"],
  right_on=["c", "d"]
)
6 Likes

I think this two sentences mean different things, in the first I assume that β€˜a’, β€˜b’, β€˜c’ and β€˜d’ will only appear in the merged table if all four columns have the same value in a given row.

Whereas in the second I assume that they only need to have the same value in the pairs (β€˜a’ and β€˜c’) and (β€˜b’ and β€˜d’) but β€˜a’ and β€˜b’ can be different and that row will still be in the merged table.

Witch one of the two assumptions is true?

My guess is the second one. The β€œall” in the first quote likely is just referencing the two pairs, not all four of them together.