Merging data frames

Hello,

I am doing the project “page visits funnel” in “Multiple tables in pandas”. I noticed there is something weird about the files provided. When I merge the dfs like this:

all_data = visits.merge(cart, how=“left”).merge(checkout, how=“left”).merge(purchase, how=“left”)
print(len(all_data))
print(len(visits))

I get a longer df for all_data than visits even though the merging left thing should make it max as large as visits.
How is this possible?

Thanks,
Jan

Hi and welcome to the forums!
So I haven’t done this specific lesson and so don’t know the exact details of the datasets but I work in data analysis. When performing a left join, it takes all of your rows in the left data frame, And matches those in the right data frame to the indexes from the left, Thus keeping all rows in the left and any matching in the right.

However, say there’s 1 record for an ID in the left, and 3 records for an ID in the right, then this will join all 3 records to the 1 on the left, resulting in 3 rows where the data is duplicated for the left df’s data, and unique info for the right df’s data. This means that at minimum a left join will result in the same size as the left df (in this situation, visits), however it can be larger if there are say, multiple checkout records for one cart visit. It all depends on what is being joined on.

Hello Jan,

Welcome to the forums!

If you post the link to the lesson and your code, we will be able to help answer your question better.
(I vaguely remember the lesson :slight_smile: )