A/B Testing for ShoeFly.com issue


Working on “A/B Testing for ShoeFly.com” I’ve found that I can’t find a concise solution for the task 10: For each group ( a_clicks and b_clicks ), calculate the percent of users who clicked on the ad by day .


So far I have two solutions:


I think I missed some concept I should use here, but I can’t figure it out.
Could someone give me a hint?

So, what question is this? 3-5?

So, you have a column called is_click which results in either True or False, correct?

Are you trying to groupby the utm source and the is_click column and count the number of users? And then you would probably have to pivot the data so you could get a full count of user ids by source, right?
I don’t think you necessarily need a function here.

It’s question 10.
I can get number of users clicked as well as number of users didn’t click the add for every weekday in each group, but my problem here is to get the percentage clicked in a concise way.

Hi, you should try to come up with a different definition of your function. I have defined it with a lambda and it works fine.

I solved it by grouping and pivoting the data.

After counting the clicks per day for A & B, couldn’t you pivot the data:

a_clicks_day_pivot = a_clicks_day.pivot(columns='is_click', index='day', values='user_id').reset_index()

And then create a col in the df called % clicked (or whatever) and then take the number of True and divide it by the total number of True + False which gives you the % for each day…

The way I did it was to add another column to the a_pivot and b_pivot dataframe named “percent_clicked” and then using the apply() method and a lambda expression to calculate:

(people who clicked) / ((people who clicked) +(people who didn't click))

Make sure to specify whether you’re using the row or column axis via the ‘axis=’ parameter in your lambda function.

@code6977593596 @christophlenz
That was the first thing I’ve done. After pivoting table I get the following one:

is_click False True
1 - Monday 70 43
2 - Tuesday 76 43
3 - Wednesday 86 38
4 - Thursday 69 47
5 - Friday 77 51
6 - Saturday 73 45
7 - Sunday 66 43

And I’m having trouble accessing the values in the columns ‘True’ and ‘False’. How can I do that?

I’m finding myself quoting Godfather II, 'Monday, Tuesday, Thursday, Wednesday…" :joy:Anyway…

Correct about the pivot table.
So, you have your a_clicks_day_pivot table (and then you do the same steps for b_clicks_day_pivot).
And then you create a new column in that a_clicks pivot table called [percent_clicked] and that is based on a row by row evaluation of a_clicks_day_pivot[True] divided by the sum of (a_clicks_day_pivot[True] + a_clicks_day_pivot[False])

Or, generically, df['new col name'] = df[True] / (df[True] + df[False])

Which then gives you a % clicked for each day of the week.

1 Like

Oh thank you, now I’ve got it.
I was trying df.True and df[‘True’] :man_facepalming: