Skip to main content
News

Presumed Averageness

This article originally appeared on the Brookings Institute website.

Imagine yourself having had a heart attack.  An ambulance arrives to transport you to a hospital emergency room.  Your ambulance driver asks you to choose between two hospitals, Hospital A or Hospital B.  At Hospital A, the mortality rate for heart attack patients is 75 percent.  At Hospital B, the mortality rate is just 20 percent.  But mortality rates are imperfect measures, based on a finite number of admissions.  If neither rate were “statistically significantly” different from average, would you be indifferent about which hospital you were delivered to?

Don’t ask your social scientist friends to help you with your dilemma.  When asked for expert advice, they apply the rules of classical hypothesis testing, which require that a difference be large enough to have no more than a 5% chance of being a fluke to be accepted as statistically significant.  (For examples, see Schochet and Chiang (2010), Hill (2009), Baker et. al. (2010).)  In many areas of science, it makes sense to assume that a medical procedure does not work, or that a vaccine is ineffective, or that the existing theory is correct, until the evidence is very strong that the original presumption (the null hypothesis) is wrong. That is why the classical hypothesis test places the burden of proof so heavily on the alternative hypothesis, and preserves the null hypothesis until the evidence is overwhelmingly to the contrary.  But that’s not the right standard to use in choosing between two hospitals.

In 1945, Herbert Simon published a classic article in the Journal of the American Statistical Association pointing out that the hypothesis testing framework is not suited to many common decisions.  He argued that in cases where decision-makers face an immediate choice between two options, where the cost of falsely rejecting the best option is not qualitatively more important or larger for one than the other (i.e. where the costs are symmetric), where it would be infeasible to postpone a decision until more data are available, then the optimal decision rule would be to choose the option with the better odds of success – even if the difference is not “statistically significant.”

Improper use of the hypothesis testing paradigm can lead to costly mistakes.  A good example is the teacher tenure decision.  There are at least two alternative ways to think about the problem.  Under the hypothesis testing paradigm, one would start with the hypothesis that an incumbent teacher is average, and only deny tenure when that presumption was beyond a reasonable doubt.  Let’s call this the “average until proven below average” formulation.

To read the complete post, visit Brookings.edu.

News

The latest research, perspectives, and highlights from the Harvard Graduate School of Education

Related Articles