FAQ: Working with Multiple DataFrames - Merge on Specific Columns

This community-built FAQ covers the “Merge on Specific Columns” exercise from the lesson “Working with Multiple DataFrames”.

Paths and Courses
This exercise can be found in the following Codecademy content:

Data Science

Data Analysis with Pandas

FAQs on the exercise Merge on Specific Columns

Join the Discussion. Help a fellow learner on their journey.

Ask or answer a question about this exercise by clicking reply (reply) below!

Agree with a comment or answer? Like (like) to up-vote the contribution!

Need broader help or resources? Head here.

Looking for motivation to keep learning? Join our wider discussions.

Learn more about how to use this guide.

Found a bug? Report it!

Have a question about your account or billing? Reach out to our customer support team!

None of the above? Find out where to ask other questions here!

A veterinarian’s office stores all of their data on pets and their owners in two dataframes: pets and owners . The owners dataframe has the columns ‘id’, ‘first_name’, ‘last_name’ and ‘address’. The ‘pets’ dataframe has the columns id , name , owner_id , and type . If the office wanted to combine the two dataframes into one dataframe called pets_owners , what following code could work?

pets_owners = pd.merge(
pets,
owners.rename(columns = {‘id’:‘owner_id’})
)
this ↑ is woking

pets_owners = pd.merge(
pets.rename(columns = {‘owner_id’:‘id’}),
owners
)

But this is not working
what is wrong?

Hi! My first guess is that there is already a column named “id” in the pets dataframe, so either you can’t have two columns with the same name or maybe the merge function can’t combine the two dataframes if there are two columns with the same name in one of them.

You don’t have the same issue with this code:

pets_owners = pd.merge(
pets,
owners.rename(columns = {‘id’:‘owner_id’})

This is because initially there’s no column named “id” in the owners dataframe and after renaming it the merge function is able to match it with the “id” column of the pets dataframe.

Hope it helps, cheers! :beer:

What order does the merge method return the data in?

orders_products = pd.merge(orders, products.rename(columns = {'id':'product_id'}))

Here the table seems to group by product description (not in alphabetical order though)…

id	product_id	customer_id	quantity	timestamp	description	price
0	1	3	2	1	2017-01-01	doo-hickey	7
1	5	3	3	3	2017-02-01	doo-hickey	7
2	2	2	2	3	2017-01-01	whatcha-ma-call-it	10
3	4	2	3	2	2016-02-01	whatcha-ma-call-it	10
4	3	1	3	1	2017-01-01	thing-a-ma-jig	5
5	7	1	1	1	2017-02-02	thing-a-ma-jig	5
6	6	4	1	2	2017-03-01	gizmo	3
7	8	4	1	1	2017-02-02	gizmo	3

I am also confused by this. Why is the id column so out of order?

This is my code

import codecademylib3
import pandas as pd

orders = pd.read_csv('orders.csv')
print(orders)
products = pd.read_csv('products.csv')
print(products)

orders_products = pd.merge(orders, products.rename(columns = {'id': 'product_id'}))
print(orders_products)

Im not

index id	product_id	customer_id	quantity	timestamp	description	price
0	1	3	2	1	2017-01-01	doo-hickey	7
1	5	3	3	3	2017-02-01	doo-hickey	7
2	2	2	2	3	2017-01-01	whatcha-ma-call-it	10
3	4	2	3	2	2016-02-01	whatcha-ma-call-it	10
4	3	1	3	1	2017-01-01	thing-a-ma-jig	5
5	7	1	1	1	2017-02-02	thing-a-ma-jig	5
6	6	4	1	2	2017-03-01	gizmo	3
7	8	4	1	1	2017-02-02	gizmo	3

Why is it that when I add inplace=True to the .rename( ) method, the merged DataFrame won’t get printed out, whereas if I leave out inplace=True, it works just fine?

import codecademylib3 import pandas as pd orders = pd.read_csv('orders.csv') print(orders) products = pd.read_csv('products.csv') print(products) orders_products = pd.merge( orders, products.rename(columns={'id': 'product_id'}, inplace=True) ) print(orders_products)

The lesson example on the left shows the customer df, and the products df but the code solution below it uses the orders df instead.

pd.merge(
    orders,
    customers.rename(columns={'id': 'customer_id'}))

Makes the code hard to visualise

1 Like

100%!

Also in exercise I think I did’t get it right, because there is four columns with the ID’s without any order :sob:

Screen Shot 2023-02-09 at 21.34.59

Why the ID order changes?