In the project walkthrough video, the merged dataframe between visits
and cart
has 2052 rows, but when I print the merged dataframe it has 2000 rows. What’s happening?
Seems like this one has come up before-
But I used the same syntax as the guy in the video. And if 2052 and 2000 both come up, which is the correct number to use?
Can you provide your code? If you used the same code as the video it should come out to 2052 rows. It still does for me, so I don’t think the data has changed since the video was made.
import codecademylib
import pandas as pd
# -- inspecting the dataframes --
visits = pd.read_csv('visits.csv',
parse_dates=[1])
#print(visits)
cart = pd.read_csv('cart.csv',
parse_dates=[1])
#print(cart)
checkout = pd.read_csv('checkout.csv',
parse_dates=[1])
#print(checkout)
purchase = pd.read_csv('purchase.csv',
parse_dates=[1])
#print(purchase)
# ----------------------------------------------------
def mergeCal(df,df2):
merged = pd.merge(df,df2,how='left') #left merge on df and df2
info_col = merged.columns[-1] # the _time column
null_rows = merged[merged[info_col].isnull()] # null rows
percentage = float(len(null_rows))/len(merged)*100 # dividing number of null rows by number of rows in merged df
return merged,percentage,len(null_rows)
visitsCart,visitsCart_per,visitsCart_null = mergeCal(visits,cart)
Well, that is certainly not the same code as the video, but your DataFrame called visitsCart
still comes out to 2052 rows for me. Perhaps you are looking at a printout for the visits
df?
Ah sorry, I meant the same methods, slightly differed ofc because I had to use them for a function.
No, I just printed it using
print(visitsCart.shape)
print(len(visitsCart))
Both came out to 2000 rows.
Well this is what I’m seeing:
Either you’re doing something different or they changed the data for newer users (or maybe with the revamp of the DS path) and let users who previously completed it keep the old data.
They only way to know for sure would be to download the csv files and compare, but I’m leaning toward it being something on your end.
Can we try this? Because I didn’t change anything, even resetting the project and re-running my code. I still got 2000 rows.
Click the folder in the top left of the code section:
Then open up the visits.csv
and carts.csv
like this:
Then click the share button:
Share the link here and I’ll take a look when I have a minute.
visits
:https://gist.github.com/a698653c546afaa2de3304372cccfdc1
cart
:https://gist.github.com/920d6011f108c0d23eb8ad1020d65f7f
Well it looks like we’ve solved the mystery. Your cart.csv
only has 349 rows, whereas mine has 401.
Looks like at some point they updated the data…whether this was intentional or not is anyone’s guess.
Here’s a REPL if you wanna check out the difference:
https://repl.it/@elCocodrilo/PageVisitsFunnelTest#main.py
Thank you so much! I’m not sure which version I should use, so I’ll just stick with the new version for now.