Key to Evaluating Teachers: Ask Kids What They Think

Thomas Kane is professor of education and economics at the Harvard Graduate School of Education and faculty director of the Center for Education Policy Research, which works with states and municipalities to evaluate educational policies. He recently partnered with the Bill and Melinda Gates Foundation on the “Measures of Effective Teaching” (or MET) project, which was intended to develop metrics capable of determining which teachers are faring better than others, and to determine what factors help determine success. He recently wrapped up a randomized study with MET that identified a number of factors associated with quality teaching. We spoke on the phone Feb. 1; a lightly edited transcript follows.

Dylan Matthews: Tell me a bit about how this study differs from the rest of the literature around standardized testing.

Thomas Kane: So for 40 years, we have known that when similar students enter different teachers’ classrooms, they come out with very different achievement. For 40 years we have designed our education policies as though that weren’t true. Very few of those differences had anything to do with teachers’ paper credentials, yet that’s the only thing that state and local policies focused on. They only focused on paper credentials, and they didn’t systematically try to evaluate performance on the job for teachers.

The test scores, we knew, were just the most obvious manifestations of what is a difference in practice underneath, but nobody was systematically trying to find ways to measure those differences in practices. Quite the opposite. Most classroom observations were entirely perfunctory. Teachers, 98-plus percent of teachers, were given the same “satisfactory” rating, if their principal did an observation at all.

It was within that context that we said, “Let’s go out and try to identify some ways to identify effective teaching that help illuminate what’s going on with the difference in test scores.” We want to know that these are at least related to the magnitude of gains that teachers provide. So let’s do that in a way where we could develop measures that could be implemented widely. That was one of the advantages of trying to start with such a large scale [3,000 teachers]. If we had tried to do it with 250 or 200 teachers, we’d have something you could do on a small scale, not a large scale.

You asked how was this different. Before 2007, there were two feuding camps in the teacher effects world. There were the outcome, or value-added only, group, that tended to focus just on outcomes and say, “Look, the effort to try to measure practice is just opinions, and subjective, and hopeless. So let’s focus on the outcome data.” And then there were the folks who tended to focus on practice. There was an organization called the National Board for Professional Teaching Standards, where a teacher could submit videos, but there was actually a hesitance to include student achievement in any of those measures, for many reasons, many of them ideological.

We tried to collect data from student surveys so that we might in the process bring together what had been very separate research. They were publishing in their own journals. To the extent that they were aware of what the others were doing, they were dismissive and critical of what others were doing. By creating this framework where we were using test score gains to validate practice-based measures, we were at least creating a common base for discussion.

