How can we think about Type 1 and Type 2 errors when we don't know the truth of an experiment?


In our examples of Type I and Type II errors, we are only able to determine false positives and false negatives because we know the true positives and true negatives. How should we think about these errors when the true values aren’t known?


Although we cannot judge the efficacy of our statistical tests by just performing them “in the wild”, i.e. for data which we don’t already have known statistics, we can judge their performance by first vetting them on our personal data sets. This is where training and test phases come into play. By performing rigorous tests of our algorithms on data we have at our disposal, we can approximate how well we will perform on new data. The more data we have to train/test on, the more confident we can be about our algorithm’s results we testing in the wild.

This all comes down to effective use of the data we have on hand. As a result, there are many techniques for data usage to avoid circumstances where an algorithm optimizes for the training data (data where our algorithm knows the answers to the question we care about) and performs poorly on unseen data, known as overfitting.