Illustration by Jamie Jones
The Testing Charade
Professor Dan Koretz opens his new book with a note of gratitude — to his editor — and it wasn’t just for the months of word choice and punctuation guidance she provided while he was writing. Koretz wanted to thank her for helping him see that although he had been writing about the problems with high-stakes testing for 25 years, he had been pulling his punches, as she told him. He had kept his writing measured, his criticism less than strong, “as is the norm in academia.” But with The Testing Charade: Pretending to Make Schools Better, excerpted here, Koretz firmly documents what he considers to be the failures of test-based accountability. (Read a Q&A with Koretz: Testing. Testing. 1-2-3.)
Almost thirty years before I started writing this book, I predicted that test-based accountability — then in its early stages, and still far milder than the system burdening schools today — wouldn’t succeed. I said that many educators would face only three options: cheat, find other ways to cut corners, or fail. As successive waves of “reform” ratcheted up the pressure to raise scores, the risks only became worse, and others and I repeated the warning.
Educators have done all three. I take no comfort in having been right.
But neither anyone else in the field nor I correctly predicted just how extreme the failures of test-based reform would be. I anticipated cheating, but not on the scale of the scandals that have begun to come to light. I expected that many teachers would resort to bad test prep, but I didn’t anticipate that states and districts would openly peddle it to their teachers. I expected that test prep would displace some amount of instruction, but I didn’t foresee just how much time testing and test prep would swallow or that filling students’ time with interim tests and test prep would become the new normal. And I didn’t foresee that test-based accountability would fundamentally corrupt the notion of good teaching, to the point where many people can’t see the difference between test prep and good instruction. I predicted score inflation, but I found its magnitude in some settings jaw-dropping. It never occurred to me that teachers would be “evaluated” based on the scores achieved by other teachers’ students or that districts would have to scramble to find any tests they could just so that they could claim to be evaluating teachers, even those teaching physical education or the arts, based on scores on standardized tests.
I’m far more interested in charting a better way forward than in pointing fingers, and as I have made clear, I have no interest in impugning the motives of the people responsible for the current system. On the contrary, many of them had the best of intentions. However, we need to look back at the causes of the failures in order to avoid repeating them in the future.
Looking back on the past three decades of test-based accountability, I have to qualify my early prediction that many teachers would fail. In an important sense educators didn’t fail. Teachers and principals didn’t manage to make the improvements in education that the policymakers claimed, but they did precisely what was demanded of them: They raised scores.
Reformers may take umbrage and say that they certainly didn’t demand that teachers cheat. They didn’t although in fact many policymakers actively encouraged bad test prep that produced fraudulent gains. What they did demand was unrelenting and often very large gains that many teachers couldn’t produce through better instruction, and they left them with inadequate supports as they struggled to meet these often unrealistic targets. They gave many educators the choice I wrote about thirty years ago — fail, cut corners, or cheat — and many chose not to fail.
This is not to say that educators are blameless, but if one wanted to ascribe blame, one would have to start far higher up the chain of command. The roots of the failures I’ve described go right to the top. Placing all the blame on educators would be more than mistaken; it would obscure much of what we need to do differently. We need changes in behavior — and incentives that will induce them — from top to bottom.
We should ask: Why has this gone on so long? Apart from details, much of what I wrote in the first nine chapters of this book is old news. We have known for decades that teachers were being pushed into using bad test prep, that states and districts were complicit in this, that scores were often badly inflated, and even that score inflation was creating an illusion of narrowing achievement gaps. The first solid study documenting score inflation was presented twenty-five years before I started writing this book. The first study showing illusory improvement in achievement gaps — the largely bogus “Texas miracle” — was published only ten years after that. In good measure, the failures of the current system have festered as long as they have because many of the advocates of test-based accountability simply didn’t want to face the evidence. Certainly, some of those making decisions weren’t aware of the evidence, and a few who were aware struggled within the constraints of current policy requirements to respond to it. However, many of the advocates were aware of the evidence but found ways to discount it — like the superintendent who said to me that he knew that there wasn’t score inflation in his district because the gains were so large. Others persuaded themselves that however badly previous attempts at test-based accountability had worked, this time they had it right.
And I suspect many of them knew that test-based accountability isn’t optimal but considered it good enough — and far less expensive and burdensome than better alternatives. That turned out to be a naive hope and a costly mistake.
Why now? Given how resilient test-based accountability has proved in the face of the bad news that has been accumulating for fully a quarter of a century, it’s easy to be pessimistic that this ship can be turned around. Why push now for a change of course?
ESSA, the replacement for NCLB, doesn’t represent anywhere nearly a big enough change of course. It maintains many of the core elements of the test-based reforms that preceded it, including NCLB. The specific changes included in ESSA — including the important ones, such as requiring states to use at least one indicator other than scores — are just very small steps, as a comparison with the recommendations in the previous two chapters makes clear. For example, ESSA only slightly broadens the focus from test scores, does nothing to confront Campbell’s Law,* doesn’t allow for reasonable variations among students, doesn’t take context into account, doesn’t make use of professional judgment, and largely or entirely (depending on the choices states’ departments of education make) continues to exclude the quality of educators’ practice from the mandated accountability system.
Yet ESSA provides a reason to be guardedly optimistic: Its enactment stemmed in some measure from a growing dissatisfaction with simple test-based accountability. NCLB was enacted with a remarkable degree of bipartisan support, but over time it lost most of its fans, and it’s not an exaggeration to say that by the end it was detested by many people in the education world. Some of the criticism of NCLB in its latter days focused on the core failings of test-based accountability — in particular, the extent to which the pressure to raise scores had come to dominate schooling. It’s remarkable that even [former U.S. Secretary of Education] Arne Duncan, who arguably did as much as any one person during the past decade to increase the pressure on educators to raise test scores, conceded that “testing issues today are sucking the oxygen out of the room in a lot of schools.” Even though ESSA won’t in itself do enough to reduce the distortions created by test-based accountability, this dissatisfaction with the past offers some hope that ESSA represents the beginning of a shift to a more sensible and productive approach.
And ESSA is not the only sign of growing dissatisfaction with test-based accountability and its effects. Many parents have become fed up with having their children in schools that are so dominated by testing. Perhaps the clearest sign is the “opt-out” movement — parents who refuse to let their children take some standardized tests. This movement is still spotty. In many locations there is no real sign of it. However, in others it has profoundly disrupted high-stakes testing. In New York, for example, where the movement was the focus of a substantial media campaign, about one-fifth of the state’s students didn’t take the state’s tests in grades 3 through 8 in 2015 and 2016. While still limited in its reach, the opt-out movement is national in scope, and it has clearly touched a nerve. This may give more impetus to policymakers to consider alternatives to the current system.
Let’s be optimistic and assume that ESSA and the opt-out movement are early signs of a growing dissatisfaction with test-based accountability and that we will finally have a chance to work on better alternatives. In the previous two chapters I’ve outlined both principles for doing better and a number of specific suggestions, but I’ll end with a few themes that pervade both.
We need to approach the task of improving education with a great deal more humility than we have for the past three decades. Under the best of circumstances, education is an extraordinarily complicated system, and the scale and decentralization of the American system make it all the more so. There is a great deal we don’t yet know about how this cumbersome and complex system will respond to new policy initiatives or new forms of practice. And like any other complex system, it will impose trade-offs, often very painful ones. Some we can anticipate; others will surprise us. And there are many different ways to implement the suggestions I’ve made. Some will work better than others. None will work perfectly, and few if any will work as well as we would hope.
How can we best respond to these uncertainties? To start, we shouldn’t — once again — overpromise. It’s tempting and politically useful to claim that we have a new approach that will produce huge gains in performance, but doing so is both naive and destructive. We should set reasonable goals and try out a variety of specific approaches for meeting them, rather than pretending that we know in advance which will function best and how much improvement they will generate.
I do mean “try out,” not “try.” We’re in the same position that Rick Mills was in when he introduced portfolio assessments in Vermont [as commissioner of education]: To some extent we’ll be plowing new ground, and we owe it to kids and their teachers to evaluate the specific options that states and districts design, discard the bad ones, and tinker with the better ones before implementing them wholesale.
And the need to monitor, reject, and revise won’t end even then. One reason is that some of our plans, however well thought out, won’t work. Campbell’s Law is another reason: People will be inventive in finding the weaknesses in any system, and new bulges will keep appearing in the hose. And on the positive side, educators and others will continually generate ideas for doing better, and these new innovations will in turn need to be evaluated and revised. It’s no accident that the governments of both the Netherlands and Singapore, which already had educational systems that produce very high achievement, have both made substantial changes to their management of schools in recent years.
Will it be difficult to implement these suggestions? Yes, very, and expensive as well. Is there room to argue about how best to put them into practice? A great deal, and we will undoubtedly make some mistakes regardless of who wins those debates. And progress won’t be fast; it will take quite some time simply to repair the damage that test-based accountability has produced, let alone to make the sizable improvements we want. But years of experience have shown that the alternative — dodging these difficulties and tinkering with what we have — is unacceptable.
* Campbell’s Law states, “The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.”
Illustration by Jamie Jones
Photos: iStock Photos