MuscleHub

Hello there everyone ! on the MuscleHub Capstone project, instead of using SQL, I tried using pandas, querying individually each table then joining them. I’m getting way more than 5004 rows (5967 to be exact). Here’s my code, each table corresponds to its name respectively :smiley:

visits['is_meaningful'] = visits['visit_date'].apply(lambda x: (int(x[0]) >= 7) | (int(x.split('-')[1]) > 1))
df1 = visits.merge(fitness_tests, on=['first_name', 'last_name', 'email', 'gender'], how='left').merge(applications, on=['first_name', 'last_name', 'email', 'gender'], how='left').merge(purchases, on=['first_name', 'last_name', 'email', 'gender'], how='left').reset_index()
df2 = df1[['first_name', 'last_name', 'email', 'gender', 'visit_date', 'fitness_test_date', 'application_date', 'purchase_date', 'is_meaningful']]
df = df2[lambda x: x.is_meaningful].reset_index()
df

I guess I’ll have my project done here if anyone is wondering! I’ll check on others :slight_smile: https://docs.google.com/presentation/d/1V-_nb1rQz9h2yNfrkvhu7KsvpjPHxRmMvSOtXIam4bU/edit?usp=sharing