FAQ: Hypothesis Testing - Dangers of Multiple T-Tests

@corepro76625,

The general idea that you are referring to is known as Family-wise Error Rate (FWER) control. As @object2161442840 mentioned, two common methods for this are the Bonferroni correction or Tukey’s range test. The important thing to remember about these (and any other statistical test) is to go into it already knowing your desired alpha level (a.k.a., significance level, or the “p-value threshold” I referred to above).

So if you go into testing with an alpha of .05, don’t change that to .01 just to do multiple t-tests. Rather, keep the same alpha, but use one of the established FWER control methods.

In the case of Bonferroni’s correction, you reject the null hypothesis for tests where the p-value is less than alpha divided by the number of tests. That is, if you had 3 tests, you can reject the null hypothesis for any test returning a p-value of .05/3 (roughly .0166) or less. If you are interested, there is a good video explaining the Bonferroni correction here.

The downside of doing FWER correction is that by lowering the Type I error rate, you are increasing the Type II error rate. So, the more pairs you are comparing, the more likely you are to have a false negative.

I would typically suggest doing a type of ANOVA test first, in order to determine if any of the pairs are significantly different. Once you know there is at least one pair that is significantly different you can use a post-hoc test like Tukey’s Range Test to determine which pair it is.

2 Likes

What is the “.pvalue”? I requested the solution for this question and it, in part, was this:

a_b_pval = ttest_ind(a, b).pvalue
a_c_pval = ttest_ind(a, c).pvalue
b_c_pval = ttest_ind(b, c).pvalue

What does the .pvalue do`? Thanks!

1 Like

The function ttest_ind() returns two values: t-statistic and p-value. To be precise, ttest_ind() returns an object which has attributes statistic and pvalue. So if you don’t add .pvalue, those objects, not the p-values, are assigned to a_b_pval etc. By adding .pvalue you can get the p-values.

The lesson description doesn’t explain that the return value of ttest_ind() has the attribute pvalue, so it’s strange to use it in the sample solution.

As a way of following the previous lesson, you can also get the p-value as follows:

_, a_b_pval = ttest_ind(a, b)
_, a_c_pval = ttest_ind(a, c)
_, b_c_pval = ttest_ind(b, c)