If two dataframes share more than one column name, how are they merged?

Question

In the context of this exercise, if two dataframes share more than one column name, how are they merged? Are they just merged on the first matching column, or every matching column?

Answer

The merge will check all columns that match between the two dataframes if they share more than one column name.

By default, if we run the pd.merge() method, it performs an inner join. With an inner join, all values of every matching column must match in order for the rows to be returned.

In the following example, only the rows for which all values of every matching column are the same will be returned.

Example

df1 = pd.DataFrame({
  'id': [1, 2, 3],
  'name': ['Alice', 'Bob', 'Carl']
})

df2 = pd.DataFrame({
  'id': [1, 2, 3],
  'name': ['David', 'Elsa', 'Carl']
})

merged = pd.merge(df1, df2)
print(merged)
#       id    name
#  0     3    Carl
4 Likes

I was carrying out the requested exercise and I tried to produce some code which saved the rows where revenue was more than target as a separate variable. I’m not sure why my code didn’t work, if anyone could explain that would be much appreciated.

import codecademylib
import pandas as pd

sales = pd.read_csv('sales.csv')
print(sales)
targets = pd.read_csv('targets.csv')
print(targets)

sales_vs_targets = pd . merge (sales , targets)

print (sales_vs_targets)

crushing_it = sales_vs_targets.revenue > sales_vs_targets.target

instead of getting an extract of the rows which had revenue exceed the target I got a list with true and false.

This line evaluates to true or false. It checks whether revenue is greater than the target and returns true if revenue is greater than target else false.

Use
crushing_it = sales_vs_targets[sales_vs_targets.revenue > sales_vs_targets.target]
Or
crushing_it = sales_vs_targets.loc[(sales_vs_targets.revenue) > (sales_vs_targets.target)]

6 Likes

crushing_it = sales_vs_targets .revenue > sales_vs_targets.target this code has only selected the columns of a table that has such columns. we need to specify the table we are fetching this info from
so your code needs to state the data-frame and then the columns like this
crushing_it = sales_vs_targets [sales_vs_targets.revenue > sales_vs_targets.target]

3 Likes

The link to the exercise does not work

Here is the link