I’m working on the funneling project where I have to determine at what steps of the purchasing process customers “drop out”.
There are four tables: visits, cart, checkouts, and purchases.
Printing visits.count() tells me that the visits table contains 2000 entries. However, when I left join the rest of the tables, using
all_data = visits.merge(cart, how = ‘left’).merge(checkout, how = ‘left’).merge(purchase, how = ‘left’)
all_data.count() gives me 2557 entries. If a left merge only adds the rows from the right table that can be matched to rows from the left table, how do I end up with more entries in the final table than I have in my leftmost table?
I tried adding .drop_duplicates() to the end of my definition but that didn’t change anything.