I just finished the Hypothesis Testing Section in Analyzing Data with Python and cannot figure out an easy way to select which of the 3 tests to use when analyzing 3 or more data sets, ANOVA, Chi Square or Tukey? Any help is appreciated!
If your data are continuous, i.e., sequences of numbers from which you can calculate and compare the mean of each data set, use t-testing or ANOVA. Examples might be heights of individuals living in different areas or having different diets, or rainfall in different locations.
If your data are categorical, i.e., discrete named items, use chi square testing. Examples might be favorite pets, sports or movies among two or more groups, such as men vs. women, assorted countries, etc . (Note that continuous data can be made categorical by dividing it into buckets, i.e., 0-10, 10-20, 20-30, etc.)
The Tukey test is a so-called post-hoc test, meaning that you employ it after performing another test. Your ANOVA test can tell you if there are means in your groups that are significantly different from the norm, but not which ones. To find that is the purpose of Tukey’s test.
Perfect! that clears it up now. Thank you!