What Can Educators Learn from the Red Sox?

The case for an evidence-based approach to assessing reform initiatives

Posted March 4, 2005
By Beth C. Gamse

In "Lessons from the Red Sox Playbook," Beth Gamse, Ed.D. '91, and Harvard Graduate School of Education Professor Judith Singer describe the benefits of rigorous research and evidence-based decision making in education policy. Reprinted from the Harvard Education Letter, January/February 2005.

"Believe!" "Keep the faith!" These have been the refrains of Red Sox fans for nearly a century. Their perpetual optimism in spite of decades of defeat echoes another refrain familiar to educators: "All children can succeed; never give up!"

Now that the Sox have finally "reversed the curse," the lessons of their 2004 World Series victory will undoubtedly generate new refrains. The New York Times' analysis, published the day after they won, noted that the team's victory reflects "the triumph of a new wave of thinking in baseball, one that has begun to place increasing importance on the kind of intellectually ambitious stewardship that stresses rigorous quantitative analysis over instinct and whim."² In other words, the Sox approached baseball not only as an art, but as a science. We think this lesson has special salience for educators. Like the pre-2004 Sox, many educators have long resisted the kind of rigorous research and scientific analysis that could identify the curricula and teaching strategies most likely to help children succeed in school.

One week before the World Series, the Times' Samuel Freedman asked why one well-intentioned school district adopted a new mathematics curriculum — Investigations, developed by the highly regarded education research organization TERC — that had never been evaluated using a randomized trial (see "Randomized Trials in Education"). According to TERC, Investigations has been evaluated only through "classroom studies, large-scale comparisons across schools, and small-scale comparisons between classrooms."³ Investigations has the imprimatur of the National Science Foundation (NSF) and the National Council of Teachers of Mathematics (NCTM). But these endorsements — which carry great weight in the marketplace of instructional materials — are based on philosophical considerations, such as pedagogy, rather than on the evidence of effectiveness (or lack thereof) that comes only with randomized trials. In fact, some critics suggest that the curriculum may contribute to the achievement gap between white and minority students. In the absence of more rigorous scientific research, the decision to adopt a curriculum like Investigations is being made on what the Times sportswriters would call "instinct and whim."

Over the past few years, the Institute of Education Sciences (IES) at the U.S. Department of Education has funded dozens of school-based randomized trials at the local and national levels. IES is also sponsoring a national effort (the What Works Clearinghouse) to survey the research literature and summarize the evidence on multiple education-related topics, giving the greatest weight to rigorous studies based on randomized trials. Many schools and districts, however, have declined to participate in these trials, and many in the larger education establishment have greeted them with profound ambivalence. Why does the mere mention of scientific rigor produce a level of animus among some educators as bitter as the Red Sox-Yankees rivalry?

Costs and Benefits

Like many baseball coaches, many educators may simply lack the skills to interpret data for themselves. Red Sox general manager Theo Epstein makes critical decisions about his team only after carefully analyzing relevant data about his players. To evaluate a player's performance, for example, he calculates the player's on-base percentage, factoring in both hits and walks, rather than relying on simpler statistics like the batting average or the number of runs batted in (RBIs). In other words, he incorporates multiple indices to yield a more informative indicator. All educators can surely appreciate indicators that capture more complexity than one facet of performance alone. Yet many educators are not confident in their ability to apply these kinds of indicators in everyday decisionmaking. Or, they may be skeptical about the costs and benefits of the research necessary to yield scientifically valid results.

What does it mean for a school or district to participate in a randomized experiment? It means agreeing to try out a new program and allowing individual students, classes, or even whole schools to be randomly assigned to either the experimental program or the control group (usually, the existing program). It means exposing children in the experimental group to an untested program; conversely, it means that children in the control group do not have access to whatever benefits the new program might confer. It means carrying out a good-faith effort to implement the new program and participating in data collection to measure its effects-which means collecting data from all participants, both those in the experimental program and those in the control group.

It's true that the data-collection demands imposed by participation in such a study can detract from valuable instruction. For instance, more classroom time may be spent on student assessment. And while assessment as part of a single study is unlikely to take more than a few hours for any individual participant, there are many large schools and districts in which numerous studies are under way. The cumulative diminution of instructional time due to participation in multiple (and presumably unrelated and unsynchronized) studies may be greater. But wholesale adoption of an untested program can lead to an even greater loss of instructional time-for instance, if it doesn't work equally well for all students, requires more professional development than anticipated, or doesn't segue smoothly from prior years' instruction.

The Ethics of Experimentation

Are education experiments ethical? Many educators assert that they are not. Some argue that random assignment unfairly deprives one group of children of new approaches or interventions that are potentially helpful, or, conversely, subjects children to experimental approaches whose efficacy is as yet undetermined. The counterargument is that a randomized trial is the fairest test of a program's efficacy. Only this information can ensure that all children have access to the very best educational practices.

A particularly vexing issue is the tradeoff between present costs and future benefits, between the short-term consequences for current students and the long-term consequences for those who follow. Decisionmakers who decline to participate in studies often cite "concern for the children." We share this concern, but we believe that future cohorts are equally important. When Epstein and his colleagues analyzed data about on-base percentages and fielding contributions, they decided to trade the superstar shortstop Nomar Garciaparra. The short-term consequences — for the team's esprit de corps and the fans — were ominous. But history has shown that Epstein was right to take the long view.

Finally, still other critics assert that the interactive nature of teaching precludes it from being a "treatment" that can be randomly assigned. Following this argument to its conclusion, education, as a broad field of human interaction, is not an appropriate arena for experiments. We take heart in knowing that identical arguments were made when large-scale clinical trials were introduced in medicine after World War II — a time when medicine was seen as more an art than a science, much as education is today. Yet few among us today have not benefited from such trials, whether the lessons were positive (e.g., the benefits of an aspirin a day to reduce heart-attack risk) or negative (e.g., the increased risk of cancer associated with long-term hormone-replacement therapy).

The "new wave of thinking" we advocate will not come easily. And, like any change, it will require education. There is fierce competition for schools' dollars from various publishers, curriculum developers, and professional development providers. Educators need the skills to recognize, and demand, credible evidence about program effectiveness. We believe that all educators-from those standing in front of a roomful of students to those leading state educational agencies-should be able to participate in and use research effectively: to distinguish between random sampling and random assignment, differentiate between credible evidence and anecdotal claims, and apply scientific conclusions for the benefit of their respective "teams." If it worked for the Red Sox, it might just work for us.

Beth C. Gamse is a senior associate in Abt Associates' Education and Family Support Area. She is currently directing the Reading First Impact Study for the Institute of Education Sciences at the U.S. Department of Education.

Notes:

1. The order of the authors was determined by randomization.
2. See Ginia Bella-fante. "New-Age General Manager Ends an Age-Old Curse." New York Times, October 28, 2004, p. D4.
3. See TERC web-page, "Investigations in Number, Data, and Space."