On the effectiveness of sample means in approximating the population mean


Although calculating the population mean can be prohibitively expensive or unavailable, as we discuss here, isn’t calculating a reliable experimental mean through samples also likely to be expensive and if not, error prone?


This is certainly a problem if we do not have a relation between sample means and population mean. Fortunately, the Central Limit Theorem guarantees us that there is. Of course, even with that, we’re unlikely to get the population mean exactly correct. Because of this, we have rigorous arguments about acceptable levels of probabilistic error (e.g. significance tests). So we can choose a degree of sampling which is likely to be computationally fast for our purposes and also generate acceptable levels of error. Generally, probabilistic certainties like this work well in practice and are preferred to no solution at all (due to unavailable data) or one which won’t complete before the heat death of the universe (in cases where we must ingest massive amounts of data).