When should we use each type of merge?



In the context of this exercise, when should we use each type of merge?


Inner merge

An inner merge will only return rows for which both dataframes have a match. It is stricter than the left/right and outer merges because mismatches are never included.

We use this when we only want rows for which both dataframes match. For example, if we have a customers table and an orders table, we might only want rows where the customer and order information is included, but not if there is any missing data.

Left / Right merge

A left merge will return every row from the left dataframe no matter what, whether there is a match or not. Missing values in the right dataframe will be replaced with None or NaN values. A right merge is just the opposite of a left merge.

An example of using this is if we had a user table and a subscription table, and we want to include all users, and their subscriptions if there are any.

Outer merge

This is used if we want to include all the data of both dataframes. Any missing value from either dataframe is filled with None or NaN.

For example, a company might have an employee table and an address table. They want to make sure any missing values are fixed, so they perform an outer merge between them. The missing employee information or addresses can then be found and fixed.