Growth and Consequences

Measuring student proficiency in the wake of No Child Left Behind

Posted January 26, 2011
By Deborah Blagg

"I like to tell students you can always start a long conversation with psychometricians just by asking how they got into their field," says Andrew Ho, a professor at the Harvard Graduate School of Education who works at the intersection of educational statistics and educational policies. The abbreviated version of Ho's own story involves a junior year abroad in Kyoto, Japan, where the Brown University neuroscience major spent several months doing research at a public high school. Struck by the extent to which "every interaction was keyed to preparing for Japan's comprehensive college entrance exam," Ho became interested in standardized testing and its effects on schools and learning. In graduate school, he earned an M.S. in statistics and a Ph.D. in educational psychology at Stanford University.

Ho's research highlights contrasts between current approaches to measuring student and school proficiency and proposes alternative metrics that address some of the problems that have emerged in standardized testing since the 2001 enactment of No Child Left Behind (NCLB).

At the time of the following conversation, Ho was scheduled to brief congressional staffers about a recent Department of Education report on so-called "growth models," an increasingly popular approach to school accountability that tracks students over time. Ho was the lead psychometrician on the report.

What do you see as some of the key problems with the way students' progress is measured now?

Strictly speaking, No Child Left Behind doesn't track student progress at all. The original act held schools accountable to minimum percentages of proficient students, as measured by scores on standardized tests, with the threat of sanctions, including school closure, if they failed. Student progress over time was not a factor; the only thing that mattered was whether a student was proficient or not. A big problem that emerged was that the distribution of incentives in this proficiency model was not uniform.

What do you mean by that?

So, say the cut score — the score that indicates proficiency on a test — is 30 out of 50. Teachers who have the responsibility to maximize the number of students who score at least 30 — especially in districts with limited resources — may have a very tough decision to make. Where might they concentrate their efforts?

Probably on the students who are just below that 30 mark?

Exactly. That's the so-called "bubble hypothesis," which has been used to explain disproportionate gains we've seen by students who are near the center of the distribution. The proficiency cut score has acted as a kind of lens to focus incentives and accountability on just one segment of students.

Was NCLB designed to work that way?

Psychometricians deal a lot in unintended consequences and hidden dependencies. I'd say this is an unintended consequence that was deeply embedded in the proficiency model that came out in 2001.

Who suffers the most under this model?

We're not giving credit where credit is due. Schools that are doing heroic work bringing students with extremely low scores up to a point that may be just below proficiency get no credit for that, and may, in fact, face serious sanctions despite the progress they are making with kids who are the most at risk. On the other end of the spectrum, students who are high achieving one year can slip drastically, and the system does nothing to flag that decline as long as they stay above the cut score.

And your research is on alternatives to the proficiency model?

Yes. One compelling feature of the growth models we're looking at as an alternative is an allowance for more realistic expectations for lower-scoring students. If you look at the distance some of these students have to travel to reach proficiency, the requirement for them to get there in just one year is more of a deterrent to effort than an inspiration. It can also lead to undesired responses by teachers who may be tempted to think of them as lost causes. Our research looks at alternatives that not only give credit where it's due, but also set ambitious yet realistic expectations for our most disadvantaged students.

How would you respond to critics who may say you are letting schools off the hook if you change the requirements?

The changes would actually let us be more accurate about detecting which schools are achieving increases in learning and achievement. While it's true that some schools now classified as failing would be classified as making "adequate yearly progress," I would argue that they are making adequate yearly progress if their students are well on track to proficiency. We have a fundamental validity problem; the model we have now is very insensitive.

How would you go about making the model more sensitive?

We look for evidence that a student will achieve a high standard in the near future. The idea is to measure what schools are actually set up to do ― teach and change students over time — as opposed to taking a snapshot of a student at a particular time and counting only that. We're moving away from snapshot models to measuring trajectories.

Would what you are talking about require changing the composition of tests or just the evaluation of results?

Good question. Some states have tried taking the numbers from the "snapshot" tests and extending them beyond what they were designed to do. That's not recommended, though, because it usually results in unsupported inferences.

Many states are using tests that are arguably more sensitive to growth and change over time. These include Massachusetts and Iowa, where the movement away from the proficiency model has had some traction. There is work on alternative tests, supplemental tests, and new ways of scaling tests. One proposal involves "through-course assessments," shorter, more frequent tests that allow for even more nuanced growth interpretations. As you might imagine, those are more formative assessments that may supplement or even replace yearly summative assessments.

What are some of the challenges you face in helping to translate this kind of research into policy?

People often talk about the dissemination of research, but John Easton, the director of the Institute of Education Sciences, prefers the word "facilitation," which connotes more than just publishing a paper and throwing it to the wind. That distinction is so important in a field that produces statistics which are used — and often misused — in sound bites and headlines about school and teacher quality.

Direct, clear communication with policymakers around these issues is critical. For example, when I speak to [Congressional] staffers later this winter, one of the points I will make is that the on-track designations and growth models we are starting to see differ dramatically across states. The contrasts are stark, and our research has clearly identified the tradeoffs. Developing a precise, common understanding of growth models at this juncture will help us avoid the kinds of unintended consequences and cross-state dependencies that developed under the proficiency model. It's a time when psychometric research can really influence the outcome.