(pandas) increased dataframe size when using left merge

I’m working on the funneling project where I have to determine at what steps of the purchasing process customers “drop out”.

There are four tables: visits, cart, checkouts, and purchases.

Printing visits.count() tells me that the visits table contains 2000 entries. However, when I left join the rest of the tables, using

all_data = visits.merge(cart, how = ‘left’).merge(checkout, how = ‘left’).merge(purchase, how = ‘left’)

all_data.count() gives me 2557 entries. If a left merge only adds the rows from the right table that can be matched to rows from the left table, how do I end up with more entries in the final table than I have in my leftmost table?

I tried adding .drop_duplicates() to the end of my definition but that didn’t change anything.


1 Like

I am running into the same issue. Were you able to figure it out?