app_pivot['Percent with Application'] = app_pivot.Application/app_pivot.Total
print(app_pivot.head())
is_application ab_test_group Application No Application Total \
0 A 250 2254 2504
1 B 325 2175 2500
is_application Percent with Application
0 0.09984
1 0.13000
The question asked from the capstone project MuscleHub A/B is:
“It looks like more people from Group B turned in an application. Why might that be? We need to know if this difference is statistically significant. Choose a hypothesis tests.”
My question is:
The correct test to choose is a chi-square test. However, the claim being tested based on the wording in the question above is ‘Are more people from Group B likely to turn in their application than Group A?’ . This would seems to mean I include 3 categories (Group B, Group A, is_application). So why would we include another group ‘No Application’ as part of our chi-square test? Because that is what the correct answer includes: Group A, Group B, is_application, No Application for the chi-square test.
I am trying to understand the implied logic here and how this can be translated into a standard when doing similar analysis moving forward.