Subject Areas
Resources
Special Sections
HGSE News  
 


No Child Left Behind?
A Faculty Response to President Bush's Education Bill

Harvard Graduate School of Education
October 1, 2002
 

Send this page to a friend
Subscribe to e-Updates

Commentaries in this Series
 Nothing New in Assessment Policy (Professor Daniel Koretz)

 A Serious Civil Rights Issue (Professor Gary Orfield)

 Good Intentions, Many Pitfalls (Paul Reville)

 Funding to Repair Rather than Re-Create (Milli Pierce)

About the No Child Left Behind Act

On January 8, 2002, President Bush signed into law the No Child Left Behind Act, opening a new chapter of education history in the United States. Developed by a bipartisan team of legislators, the act mandates that states establish tough new academic standards, improve teacher quality, and create safe schools, among other measures. It also allocates a surprising $26.5 billion to public K-12 education—a 20 percent increase over last year.

Despite decades of attempts to foster educational equity, big barriers remain: the achievement gap between students of color and white students has widened since 1988; although violence has been on the decline, 37 percent of American students still report the presence of gangs in their schools; and debates still rage over school vouchers and charter schools, both of which divert funding from public school systems.

Can President Bush's bill address these and the many other complex dilemmas inside America's public schools? Will the plan benefit the students it seeks to serve? Will the act help or hinder student learning? These are the questions we posed, with one expert opinion from our faculty on an aspect of the bill featured here.

Professor Daniel Koretz is a faculty member in the Administration, Planning, and Social Policy area at the Harvard Graduate School of Education. Here, in an excerpt from a presentation at HGSE, he discusses inflated test scores, a problem exacerbated by the No Child Left Behind Act.

Professor Daniel Koretz
Professor Daniel Koretz 

Even though this bill is 1100 pages long, it really is a very simple bill. The model of improvement it presupposes is this: you assess student performance using measures that you think are sufficient to summarize what kids have learned over a long period of time; you set very ambitious targets for improvements in scores on those tests; you require continual improvement; and then you reward and punish.

This is precedent-setting at the federal level, but it is nothing new. It's the culmination of a 30-year trend in assessment policy. Much of what's in the bill reflects developments at the state level over the past 10 to 12 years: there is virtually nothing in the bill's assessment provisions that has not been tried before by at least one state.

What that means is we actually know something about how these things will work, or at least how they have worked. We have not only experience, but also some hard research. That experience and those studies suggest that there are some very serious assessment-related problems with this model. For that reason, I'm very pessimistic the bill will do what its proponents expect it to do.

Test Scores—The Illusion of Progress
Taking just one of the many assessment issues, let's look at the inflation of test scores, a problem I've been working on for almost a decade and a half now.

Inflation of test scores refers to increases in scores that are markedly bigger than the improvements in performance they're supposed to denote. It's problematic not only because it creates an illusion of progress, but also because it means that real problems go unnoticed. It covers things up, allowing serious problems to fester.

Why does this happen? We know actually quite a bit about the mechanisms that underlie score inflation, but I'll just give you four points:

  • People faced with the demand that they substantially increase scores and do it rapidly have a very strong incentive to focus on the test itself. The test itself is small. It's a very small sample of a very big domain of achievement. The incentive is to focus on the test, not what it's supposed to measure.
  • We have studies showing an inappropriate reallocation of resources away from aspects of the curriculum that are important but aren't tested. The result is a transfer of student achievement from one area to another, masquerading as an aggregate increase.
  • We have all manner of coaching, of teachers showing their students how to write things that match the rubrics that are used to score, and so on, that don't really improve their level of achievement.
  • We have increasingly frequent cheating scandals.

All of these occur because of one simple problem: excessive attention to the indicator, rather than to what it's supposed to indicate. I'm going to give you just one example, from the first study of it I ever did, in the late 1980s when pressure to raise scores was far, far lower than it is now. By today's standards, this was a low-stakes testing program, but it was high enough stakes that teachers worried about scores. [View results from this study in a separate window]

Results from the study
Results from the study [View these results in a separate window]

The scale of the results is in grade equivalence, which is years and academic months, ten months in a year. This is the end of third grade in mathematics; an average school would have been a 3.7, three years, seven months. You can see the blue diamond in the top left is the last year that the district used one of the five big, standard commercial tests. You can see that, at that point, they were reporting to their parents that the kids in this district—which was a high-minority, high-poverty district—were half an academic year above average.

The state then bought a new test, which is the red data points. Scores immediately dropped to average, and four years later were back again where they had been, a half a year above average. That much was not new. People in my field knew to expect that.

What was new was the blue diamond in the lower right. We went in in 1990, took a large sample of classrooms, and randomly assigned them to five different testing conditions, one of which was the exact same test the district had last used in 1986.

You can see that when we administered the test, four years after the kids had been told to worry about that particular test, they were, lo and behold, average.

This study came out in 1991, and I've had the opportunity to ask hundreds of people which number you should give parents; are these kids average, or are they half a year above average? Almost everyone says, "Clearly, they're not really a half a year above average." But the premise that underlies the No Child Left Behind bill is that you should give people the top number. Because what we're going to get is that red line, but at a steeper slope, at a steeper angle.

Making Things Better? Or Worse?
What can we do about this? We are not going to get out of this problem by using different styles of tests. In the early 1990s, people claimed that the problem was that this kind of thing occurred with multiple-choice tests. So the thinking became, "We'll use tests worth teaching to, which involved writing, group performances; all manner of things other than multiple choice."

But when I went to Kentucky, which offered the archetype of that kind of reform, and looked at what happened with the scores on their test that was worth teaching to, it was the same thing, but worse. Because it's still a small sample of performance, and teachers are being paid in that case—just as they will under the federal law—to improve performance on that small measure.

Nor are we're going to solve the problems with current trend in education policy toward tests that are aligned with standards. That makes no difference. In fact, it could make things worse.

So what should we do? Some suggestions:

  • We need a system in which people who design and evaluate these programs collaborate with the people who actually institute the policies.
  • We have to devise better ways to use tests and accountability systems. Many of the tests were expressly not designed for high-stakes circumstances, and they don't work when they're used that way. We have to figure out if we can design tests that actually would be better suited to that purpose.
  • We have to have a serious evaluation of ongoing programs. This is the only public policy area of which I'm aware, in which policymakers feel free to impose risky interventions on people who do not give informed consent—including children—without feeling any obligation to find out what their policies do to people.
  • Finally, we need to educate future policymakers so that they take a more complex and realistic view of what they can accomplish with accountability systems.

For More Information
More information about Daniel Koretz is available in the Faculty Profiles.

What do YOU think?



HGSE News, Harvard Graduate School of Education
© 2008 President and Fellows of Harvard College.

Classroom Practice | Cognitive Development | Technology & Learning | Urban Education & Equity | Educational Reform | Educational Administration | Subscribe | Advanced Search | Feedback | About the Site | Faculty Research | Faculty Profiles | News Office | Books & Special Features | In the News | Press Releases | On Campus | HGSE News Home | HGSE Home