Page Visits Funnel Project

https://www.codecademy.com/paths/data-science/tracks/data-processing-pandas/modules/dspath-multiple-tables-pandas/projects/multi-tables-proj

this is for question 8: What percentage of users proceeded to checkout, but did not purchase a t-shirt?

left_checkout_purchase = pd.merge(
  checkout,
  purchase,
  how='left'
).reset_index()
#print(left_cart_checkout)

total_checkout = len(left_checkout_purchase)
print(total_checkout)
null_purchase = len(left_checkout_purchase[left_checkout_purchase.purchase_time.isnull()])
print(null_purchase)
percent_no_purchase = float(null_purchase) / total_checkout
print(percent_no_purchase)

all_data = visits.merge(cart, how='left').merge(checkout, how='left').merge(purchase, how='left')

#print(all_data.head(20))

num_checkout_not_null = len(all_data[all_data.checkout_time.notnull()])
# use notnull (not null) to find the opposite of isnull
print(num_checkout_not_null)

num_purchase_null = len(all_data[(all_data.purchase_time.isnull()) & (all_data.checkout_time.notnull()) ])
# we want to see people that have checked out (not null) but those same people did not make a purchase (is null)
print(num_purchase_null)

percent_checkout_not_purchase = float(num_purchase_null) / num_checkout_not_null
print(percent_checkout_not_purchase)

I have two data frames.

1 is Checkout & Purchase only.

1 is all data.

I am trying to find the percent of people who did not purchase but did have items in check out.

Given the codes above I got 2 different answers. The video did it by combining Checkout and Purchase and solving it that way.

I tried to solve it by using all_data.

Does this make sense:

num_checkout_not_null = len(all_data[all_data.checkout_time.notnull()])
# use notnull (not null) to find the opposite of isnull
print(num_checkout_not_null)

num_purchase_null = len(all_data[(all_data.purchase_time.isnull()) & (all_data.checkout_time.notnull()) ])
# we want to see people that have checked out (not null) but those same people did not make a purchase (is null)
print(num_purchase_null)

Length of dataframe all_data, column checkout_time, and notnull values

Length of dataframe all_data, and two conditions: 1 is check_out is notnull, 2 is purchase isnull

I thought doing it this way will give me the same answer as the video but I got a different answer.

Did I do something wrong?

1 Like

Perhaps this is due to the problem that there are duplicate user ids in cart, checkout, and purchase. Each time you merge DataFrames, the number of rows increases by the amount of duplication, so the result will be different if you merge all and if you merge two by two.

I found two topics discussing this issue:


1 Like

This topic was automatically closed 41 days after the last reply. New replies are no longer allowed.