EdCast

What Test Scores Actually Tell Us

Test scores are not — and should not be — the whole measure of academic achievement for kids. How taking a comprehensive look at data can provide a more complete picture.

Posted November 6, 2019
By Jill Anderson

Graphic: The Educational Opportunity Project at Stanford University

Professor Andrew Ho thinks test scores often simplify how we view student performance, school effectiveness, and really educational opportunity. By taking a more comprehensive look at data like test scores and learning rates in districts, we may be able to better identify and contextualize how well a school is really performing. In this episode of the Harvard EdCast, Ho discusses his work with the Educational Opportunity Project at Stanford University and how it provides data to help scholars, policymakers, educators, and parents learn to improve educational opportunity for all children.

TRANSCRIPT

Jill Anderson: I'm Jill Anderson. This is the Harvard EdCast. We love and perhaps even loathe test scores, but it's how so many of us choose where to send our kids to school. Then how good a job a school might be doing and how we compare schools. Harvard professor Andrew Ho wants us to think more about what test scores actually tell us. He studies how we design educational test scores, use them, and whether there's better ways to judge academic performance and educational opportunity. He's involved in the recently launched Educational Opportunity Project based at Stanford, where people can go to get a more comprehensive look at their school district. I wanted to know more about that project and what educational opportunity really means.

At edopportunity.org we're measuring test scores and test scores are not the complete measure of what we want for our kids. Nonetheless, if you look at all of these state tests, what do they measure? Mathematics proficiency and reading and those are quantitative reasoning, as well as a kid's ability to read are really important. You need to read to be able to learn. And so they are important measure as well we also recognize they're not the sum total of everything we hope schools to do and hope for our kids. So that's what we mean Ed Opportunity as we operationalize it on the website as test scores, but they are strongly correlated with a number of other measures that we do care about, right? So while recognizing that they're incomplete, we also believe they're important.

Jill Anderson: What our test scores actually tell us about a school and the students there?

Andrew Ho: I got into this field because I recognized as a teacher in high school and also in middle school and as an observer of schools and particularly in Japan where I spent my junior year abroad, just how powerful test scores were as a lever for policy and curricular change. Everyone was saying, "We have to teach this because it's on the test." And that I realized was both very, very sad and an incredible opportunity to improve test scores and to improve the design of tests, not just to make them more relevant, but to make the results more actionable. And that's how I sort of got into this, this whole enterprise was a belief that we weren't designing tests right and using test scores right and that we could do a better job.

So I, and many other psychometricians, measurement folks, have really mixed feelings about tests. We want to improve them, we believe that they're powerful, but we believe they're also just sort of too powerful sometimes and people simplify what educational quality and opportunity is to a single number that is useful but imperfect. And so it's with that sort of humility but also recognizing this incredible promise that I went into this project thinking, "All right, how can we take the 350 million test scores that we've accumulated over the past 9, 10 years or so, and put them to good use to enable people to understand how complicated and variable educational opportunity as measured by test scores, is in this country?"

And so what I would recommend is for any given subject or grade, every State in this country makes test score items, the questions we ask of kids available. And I think we can be skeptical and should be skeptical of tests, but if you look at most of these questions, you sort of say to yourself, "Actually, I kind of want my kid to be able to answer this correctly. I kind of would like them to be able to understand vocabulary words, use them in context and be able to reason with numbers and algebra and geometry." These are abilities and skills and dispositions that we really do want our kids to have. Are they perfect? Are they everything? Do they encompass social, emotional aspects? No, they are incomplete, but they are still desirable. And so it's with that sort of humility, but also recognizing the importance of being able to communicate, being able to reason that we believe that these are incomplete but important measures.

Jill Anderson: Do you think the public understands the complexities of this? There's a lot of websites in the world that compare test score results from one community, one school, to another, and that's all you're getting. You might get some other ratios, demographics, that kind of thing, but it's fairly limited information.

Andrew Ho: Right. This is the problem we're trying to solve. The learning goals of this project are to appreciate just how different are test scores from changes in test scores because isn't that what education is about? Not just a score, but how we improve that score, right? Not just my daughter's ability to read, but how she gets better at reading and what I can do to improve that reading. And so that's the story of learning. And for us to be able to distinguish between those two, between test scores and learning changes in test scores, right? Or another way to put it is between proficiency and growth is a key learning goal of the website. And so, no, I don't think we as a society and even among folks in education communities, we are able to distinguish enough between what is good in terms of a level and what is good in terms of progress. So I hope that we can keep both in mind. It's not to say that performance doesn't matter and learning is all that matters, right? We should say that both matter. We can want high levels and we also want progress.

Jill Anderson: One of the things that is introduced through that Educational Opportunity Project is the learning rate. What is a learning rate?

Andrew Ho: When we say like, "There are good schools there," let's unpack that a bit. There are at least two answers there, 50 that we should consider, but at least two criteria that we want to really drill down in this project. And the first is to get people on the hook with what they already think they know, right? Which is about average test scores. And usually when you say, "There are good schools there," you think to yourself about average test scores. And of course the first thing you see when you click into that chart, we call it the Galaxy Chart because it looks kind of like the Milky Way at night, right? So that scatterplot there that shows that striking strong correlation between socioeconomic status and test scores, right? That's what we want to complexify, they are average test scores.

But now wait, look. Look at that tab over there and all of a sudden you see learning rates. Now what are learning rates? There's a difference between saying kids are above average in third grade, above average in eighth grade and coming in below average in third grade and ending eighth grade above average. So how would we describe those two schools where you come in in third grade above average, leave in eighth grade above average versus a school where you start below average but end above average? How would you describe those? They're both having kids that leave in eighth grade well above average, but wouldn't you say, "Wow, there's really something going on with that school that's bringing in kids at third grade who are below average and they're leaving at eighth grade above average."

So that's what learning is, and I think it's striking to realize that if we were to just to take an average, we would rank that school that's doing such a good job of bringing kids at the third grade who are below average, all the way up to above average, we'd rank that school below the other, right? One school is, you might say it's like polishing bright apples. It's like you get all these kids with high socioeconomic status and you bring them in and you keep them there. Congratulations. And here's another school that's taking students who have not had much early childhood opportunity, or early great opportunity and they're launching them to really high levels.

Shouldn't we recognize that as opposed to penalizing that school for having low third grade test scores? Shouldn't we care that there's a ton of learning happening at one school and not at another? And we have to recognize at a snapshot that if you just look at the districts that are polishing bright apples, should we be giving them credit for that? Or should we also recognize a second criterion, which is how much learning is going on in these schools?

Jill Anderson: I can imagine people hearing this and Googling this and they're pulling up their community and they're seeing negative 12% learning rates, for the district, for the whole community and hitting the panic button.

Andrew Ho: What negative means here is below average, right? It does not mean that kids know less this year. In what ways would we want them to be concerned, right? You know that your school has high or low test scores on average, does that mean anything about what you can say about their learning rates? And the answer is you don't know anything. All this knowledge that we have in our heads about what good schools and not so good schools are based on these averages, which don't really tell you about learning. In a way I hope it forces people to reconsider their notion of what a good school is and then ask, "Wait a second, how can we do better? How can we make sure that our kids who are coming in above average and might be leaving more close to average, what can we do about that?" And the answer is that there's a whole bunch of schools and districts out there that have really high learning rates, perhaps we should learn from them.

And so I hope that the panic isn't from this notion that a school that has below average learning will always be such, and the work of anyone who deals with numbers to take an improvement mindset to them. To say that these are not fixed features of schools, these are not fixed features of districts. These are things we can change and all of us at the graduate school of education, whoever all about, it's like, "Learn to change the world," we say. So yeah, let's think about what we can do better. And so I hope the panic is not the paralyzing panic, it's the productive panic, right? It's like, "Okay, let's get to work," and figure out why there's so much variation in achievement and in progress and learning across these schools and districts and learn from the best of them so that that negative 12% becomes positive 12% in the future.

Jill Anderson: So we do want to look at learning rates more?

Andrew Ho: Well, for what purpose? Again, one of the stories in our discoveries is that gaps correlate strongly with segregation. And so one of the things we deeply do not want to do is encourage parents to use this to select the quote best schools on any metric, right? As opposed to saying, "This helps me to discover what every school in every district can do better." Because again, these numbers are malleable, right? We can do something about them. So yes, I think the level one goal is to complexify the notions of school quality, right? And to say that there are actually multiple criteria we might want to consider when we're thinking about how to figure out if a school is doing well.

But then, once we figure out if a school is doing well, the solution isn't to send all the kids to just those schools. That is our sort of darkest fear out of all of this. We do not want to simply resort everybody into high quality schools, however defined, we want to say there are lessons to learn in all of these schools that we could apply broadly so that everyone, to use the Lake Wobegon metaphor, everyone should be above average, right? So everyone should be continuously improving. And if you look under the hood of the data, what we've done is we've managed, again, we described it as like a patchwork quilt where each State and every year is its own little patch. And what we've done is through our methods and statistical and psychometric methods, we've managed to stitch all these different patches together into sort of a big picture.

I think it's only when you have that context, the silos create this kind of... You have these blinders on and you're only looking at Massachusets and you're only looking at 2019. You don't see change and progress and you don't get to contextualize it in a nine year history of spanning grades three through eight. And so what I'm really proud of is that we've managed to the whole is greater than the sum of the parts, right? It's like we can see through the stitching the dramatic variability in test scores and in learning rates, and that I think is what we hope along the lines what we've seen is like, "Yeah, we've always known about proficiency and growth, but it's never been presented in a way where you can see all the constellations, all the pieces of the puzzle," and that's what we hope this project does.

Jill Anderson: Andrew Ho is a professor at the Harvard Graduate School of Education. I'm Jill Anderson. This is the Harvard EdCast produced by the Harvard Graduate School of Education. Thanks for listening.