A/B Testing for ShoeFly.com (Task #06)

Hi, I have a question related to this exercise. This course is found in pandas section in data science track.

My question is related to Task 6: Create a new column in clicks_pivot called percent_clicked which is equal to the percent of users who clicked on the ad from each utm_source .

This is my original attempt, and the percent_clicked column showed incorrect result:
(1st method)

percent = lambda row: row[True]/(row[False] + row[True])
clicks_pivot["percent_clicked"] = clicks_pivot.apply(percent, axis = 1)

A answer before states that lambda function returns an integer, so one should multiply the expression by 100, but it doesn;t work.

The hint in the question is as below:
(2nd method)

clicks_pivot['percent_clicked'] = \
   clicks_pivot[True] / \
   (clicks_pivot[True] + 

Although the second method is intuitive, I am still curious that why doesn’t the first method can’t show the correct result.


Seems like some kookiness involving Python2 and different behaviour when dividing two pandas.Series and when dividing two integers.

In Python2 dividing two integers with the / operator produces another integer. You can try this quickly by using the calculation for the first row if you like-

print(80 / (175 + 80))
# or get the type-
print(type(80 / (175 + 80)))

This would explain why you get 0 as the output. If you wanted it to worked then make sure to use a float somewhere. Since you say percentages (even if you don’t use them here) perhaps multiplying by the float 100.0 would work but make sure you do that before the two integers are divided (either of the two can be a float to get a float output).

The quirkiness pops up in the division of two pandas Series which seems to provide a float64 type instead, try for example-

print(clicks_pivot[True] / clicks_pivot[True])

would show an output like-

dtype float64

I don’t think that behaviour exists when dividing two numpy arrays (I didn’t use pandas back when I had to use Python2) so it’s quite unexpected. There may be some history in versioning here which covers it in better detail.

Pandas compared to numpy (python2 supported versions)

x = pd.Series([1, 1])
y = pd.Series([1, 1])
print(x / y)
Out: 0    1.0
1    1.0
dtype: float64  # floats

x = np.array([1, 1])
y = np.array([1, 1])
print(x / y)
Out: [1 1]  # These are integers

print((x / y).dtype)
Out: int64