Page Visits Funnel Project Question

Hello everybody,
i merged all four steps of the funnel, in order, using a series of left merges . Save the results to the variable all_data, as the project asked.
And if you research the tables visits and all_data using .info(). You get following infomations:

[table visits]
RangeIndex: 2000 entries, 0 to 1999
Data columns (total 2 columns):
user_id 2000 non-null object
visit_time 2000 non-null datetime64[ns]
dtypes: datetime64ns, object(1)
memory usage: 31.3+ KB

[table all_data]
Int64Index: 2372 entries, 0 to 2371
Data columns (total 5 columns):
user_id 2372 non-null object
visit_time 2372 non-null datetime64[ns]
cart_time 720 non-null datetime64[ns]
checkout_time 598 non-null datetime64[ns]
purchase_time 497 non-null datetime64[ns]
dtypes: datetime64ns, object(1)
memory usage: 111.2+ KB

Why the table visits have 2000 rows but all_data 2372 rows?

Could you please format your code and add a link to the relevant lesson/project. If you cannot edit your first response then include the details as a reply following the guidance given at the link below-

Funnel for Cool T-Shirts Inc.

That is a good question. This is the count that I came up with …

2000 number of visits to site
348 number of users who added to cart
360 number of users who went to check-out
252 number of users who completed purchase

and using…
print("unique visits " + str(visits[‘user_id’].nunique()))
print("unique cart items " + str(cart[‘user_id’].nunique()))
print("unique checkout items " + str(checkout[‘user_id’].nunique()))
print("unique purchase items " + str(purchase[‘user_id’].nunique()))

that gives …
unique visits 2000
unique cart items 348
unique checkout items 226
unique purchase items 144

print("unique visits " + str(visits[‘visit_time’].nunique()))
print("unique cart items " + str(cart[‘cart_time’].nunique()))
print("unique checkout items " + str(checkout[‘checkout_time’].nunique()))
print("unique purchase items " + str(purchase[‘purchase_time’].nunique()))
gives …

unique visits 1996
unique cart items 348
unique checkout items 353
unique purchase items 246

So, I also do not understand why all_data has a total of 2372.

The reason why all_data has a number of rows greater than visits is because a user can make more than one purchase and then its user_id and corresponding data will appear more than once in all_data. Notice how some users in all_data appear more than one time - for instance, the user identified by the id 21dec5fa-999a-45c5-b59b-18a1ee161379 made 20 purchases. You can visualize this with the groupby method.

all_data.groupby("user_id").count()

This returns another dataframe with the number of times they visited (only 1 per user - so it will have 2000 rows) and the other columns have the number of times they entered the cart/checkout/made a purchase.
Also note that the number of unique user_id in all_data is still 2000 which is consistent with visits.

1 Like