EdCast

Student Testing, Accountability, and COVID

Professor Andrew Ho on whether standardized testing is the best way to assess student learning — and learning loss — during COVID times.

Posted March 11, 2021
By Jill Anderson

Professor Andrew Ho contends that we still don’t know enough about what’s happening with students' learning — or their lives — to draw conclusions about learning loss during COVID. It’s one of the many reasons why Ho supports President Biden's recent call for states to resume standardized testing this spring, even though many states are reluctant to do so.

While Ho knows that standardized testing doesn’t address students’ mental and physical health, he says it can offer some insight as long as it’s done differently than previous years.

"What assessment is here for is being honest about what we know and what we don't know," says Ho. "This is the opportunity I see for the spring is to have an improvement mindset. But in order to know what to improve, you have to start with where you're at and we don't know where we're at, and that's the problem."

In this episode of the Harvard EdCast, Ho explains why standardized testing must go on in the face of COVID and what it can offer in targeting support to school districts.

TAKEAWAYS

Standardized testing should be thought of less as educational “assessment” and more as an “educational census” that carefully gathers information based on student population, attendance, and performance, says Ho.
Testing cannot be conducted this year the same way it has in the past, due to variances in how students are attending school. Straight reporting of scores will not give an accurate picture of learning for all students in the district. Districts should be more deliberate about breaking up their reporting and interpretting based on the way students are attending school: in person, hybrid, or remote.
In an effort to gather data, H reminds that it’s important to focus on student’s physical and mental health first before test scores and to remember that testing is just one measure in a holistic picture of student learning.

TRANSCRIPT

Jill Anderson: I'm Jill Anderson, this is the Harvard EdCast. Harvard Professor Andrew Ho thinks it's important that standardized testing happen this spring. He's an expert on educational assessment who agrees with President Biden's recent call to resume testing. Not all educators are happy about this. Professor Ho proposes we think about testing a little differently this year and consider it more of a census. He says there's just too much we don't know about how COVID has affected learning, and testing is one way to get some insight. There's been many claims made already about COVID and learning loss in the media. I wanted to understand what we know about how students are doing and how testing is a way to make sense of this.

Andrew Ho: What assessment is here for is being honest about what we know and what we don't know. This is the opportunity I see for the spring is to have an improvement mindset. But in order to know what to improve, you have to start with where you're at and we don't know where we're at, and that's the problem.

Jill Anderson: Everywhere we look, we're seeing COVID slide or stories about children who are suffering because of whatever circumstance they're in. It doesn't even matter whether it's remote, whether it's hybrid, whether it's they're just not in school. So, do we have the information at this point to understand what's happening for students to even corroborate any real understanding of a loss?

Andrew Ho: I think it's really helpful to talk about all the things that matter in education, and then be very clear that standardized tests, which are my business, which is what I do research on and try to improve. That's one of 50 other things that we care about that is receiving a disproportionate share of the emphasis right now. And that I think is okay, because one thing that tests do well is ensure comparability. We know exactly where we were two years ago. And so, if there is a relative drop compared to where we were two years ago, we can say, "Okay, this is how much we have lost in terms of where we were two years ago, and this is what we can do to make sure that we gain it."

But there are so many other measures. I tell my students in my statistics class, it's like one physical health, two mental health, and then three, maybe after that, we can talk about learning, but the two are necessary preconditions for the third. And I find it striking how little we know just about the physical and mental health of our students. Not to mention just stepping back from this and saying, where are they? We don't even have the same students we had two years ago. What we are about to have this spring with tests and all that we don't know. The first question is, who's there and who's not there?

Jill Anderson: Mm-hmm (affirmative).

Andrew Ho: That is not a test as much as a census. We have to think about this spring's assessment effort broadly as a census, because there's definitely research out there that shows that we've lost maybe two to 10 percentile points of learning. If the median student now is at the 48th to 40th percentile of where we expect, typically. So, they've dropped maybe two to 10 percentile points. But stop right there and say to yourself, that's their mathematical proficiencies and skills and what they're able to do in math, that's what they're able to do in reading, what about how they're doing emotionally? What about how they're grappling with social skills and the relative lack of chances to interact with each other?

So, first it's one of multiple measures, and then second, who's not there? All of that research was done with kids who are actually there.

Jill Anderson: Right.

Andrew Ho: What I'm most worried about is the kids who weren't, and that's such an obvious thing to say. But right now, we're looking where the light is, and we haven't noticed that the room is half in darkness. It's like typically year to year, you look at what's changed in a bright room. And this year we're like, "Okay, well, half the room is dark, but let's talk about what's going on over here, where we happen to have a little lamp."

Jill Anderson: Right.

Andrew Ho: That's what I fear we're missing most. So, there's two blind spots that we have right now. There's two missing stories. The first, about all the other measures we should be caring about beyond mathematical and reading proficiency. And then second, all the other students who are not measuring.

Jill Anderson: This is super complex because we're not comparing apples to apples in any way from this year to any other year ever. So, there's that element of this.

Andrew Ho: Yes.

Jill Anderson: So, I know that some states have come out and they're asking for waivers and some other things. Some states have come forth and said they don't want to do the standardized testing this year, which I guess is understandable. And President Biden, of course, has come out and said that we're going to move forward with standardized testing with some flexibility.

Andrew Ho: Mm-hmm (affirmative).

Jill Anderson: Can you explain what that flexibility is and what it looks like for accountability?

Andrew Ho: There's a few different signals that they've given. The first and most important, from my perspective, is a position I strongly agree with is that they're not requiring accountability provisioning. They're not going to assign any school to a designation of needs improvement in this year where it clearly wouldn't be the school's fault. It clearly wouldn't be the teacher's fault. So, it would be very, very poor judgment without any scientific basis to say that any changes this year were due to the efforts or lack thereof of any school or teacher. So, accountability provisions are off the table, which I think is extremely wise.

The second category of flexibility is the timing of assessments. And one indication that they've given is that they would allow testing during the fall, instead of the spring. Now, I don't believe that's wise. And the reason is because there's no appropriate fall baseline that we have from previous years, to which we can compare to tell if something is unusually good or bad, as far as any indicator goes. Nonetheless, I do recognize that flexibility is, in particular, if there're places where they're not going to get any test scores at all. If they test in the spring, better to get something than nothing.

And the third broad category of flexibility is just to remind people that they're not forcing people to take tests where health conditions don't enable that to be done safely. And again, this testing is rightfully tertiary, physical health, mental health, and then learning. And I think the Biden administration's memo reflects that.

Jill Anderson: So, you have recommended that we move forward with the testing. You just said that you think we should do it in the spring. And you've already talked a little bit about thinking about this as an educational census versus an assessment.

Andrew Ho: Mm-hmm (affirmative).

Jill Anderson: And I just want to hear a little bit more about that. What does it mean and how is it different from what we would do in a regular year?

Andrew Ho: Again, I would say that my recommendation to tests in the spring is conditional on it requires this perspective, otherwise you will make mistakes in judgment. Why test? Step back from all of this and say, "Why would we be interested in testing in this year, given all else that's going on?" And the answer is again, very clearly laid out in the Biden administration's memo, to target support, to target resources. And if your goal is to target resources and support, first, you want to do so as soon as possible, and second, you want to do so accurately. You have a guess as to who the schools are and districts are, and communities are that need the most support. And hopefully, that's already being directed because you have results from two years ago, but you also have communities and schools that are especially hard hit by this pandemic.

So, you want to not just document the fact that some communities need support, but there might be new communities that need support that might not have needed that support two years ago. That's why I think you need a spring baseline to compare what's different this year than two years ago. And so, in order to do that, this idea of a census is, again, put tests appropriately tertiary, a third in rank after who's there, how are they doing, and then start to think about their learning. In my recommendations for how to report scores, we have to step back and say, first, again, who's there? Who is in school? In an ordinary year, we know who's in this school, it's everybody, right?

Jill Anderson: Right.

Andrew Ho: Everybody's usually in a public school. And so, we can say, here are our kids and here are their scores. And you can compare them to the scores last year, because it's the same population of kids. That is not the case this year. So, if you have a school that two years ago, it had some great distribution of students, and then perhaps, all the high scoring students left for other schools or decided to stay at home and to not take comparable tests. Or the opposite, where all the relatively low scoring students decided to go to other schools or not take the tests or not being home, not to mention all the other at-risk populations, including homeless students and all the other folks that public schools rightfully serve.

So, there are all these populations of students who in this year might not be there. And what is wonderful and necessary in our state data systems is a longitudinal record and history of where students were two years ago. So, we can actually compare the test scores of kids this year to the test scores of kids last year, not just by looking at percentages of proficient students, but saying, "I didn't know you were there two years ago, and now you're here and here's your score." And to say, "You were there two years ago, and now you're not here, now we don't have you in the picture. Now we don't have our eye on you." And so, I've recommended that what states do first in their reporting is not report scores, but report a percentage of kids who are there, because ordinarily, that's 95 to a 100%. This year, it will not be.

And what I think that percentage hopefully does is focus everybody's attention on two groups of people that you have to track simultaneously, first, the people that you have data for, and second, the people that you don't have data for. And what I find disappointing about the current coverage is that we're focusing on the results for the kids we have data for, without focusing on the kids who we don't have a read on anymore. And I think, equity and fairness requires that we focus on both groups of kids. And I also think that as we endeavor to support schools, we have to target resources and supports based on both of those categories.

Jill Anderson: This is not something that I had thought about, but when you have schools that are still remote, can they not administer these standardized tests? Do you need to be...

Andrew Ho: Yeah.

Jill Anderson: ...in-person to take these? So, is that what we're getting at here?

Andrew Ho: Mm-hmm (affirmative). Right.

Jill Anderson: I already know that we don't really have great data on who's in school and who's out of school.

Andrew Ho: Yes.

Jill Anderson: So, what happens to all the kids who are in remote learning? Does that prevent them from being able to take some of these standardized tests?

Andrew Ho: Yes, that's exactly right. And so, I've tried to make clear, you have to distinguish between the kids who aren't there because they're not even in school, and the kids who are not there because they might actually have test scores. You just can't use them because maybe in the protocols that you've released for remote testing, you weren't clear enough that the parent shouldn't be looking over the kid's shoulder or on the flip side, that the kid is being very distracted or he doesn't have good internet access. We have to talk about the percentage of kids who have comparable scores and the percentage of kids who don't have comparable scores. And that percentage may be due to the fact that they're no longer in school, and may also due to the fact that they don't have in-school testing. And therefore, don't have comparable valid scores that are going to be interpretable without having to say, "Oh, and don't forget, it could be because their internet failed or it could be because they don't have accessibility because they have individualized education plans and the browser didn't support the supports that they usually have."

It could be any number of reasons. Bottom line is what we in measurement care about is fair comparisons of scores, and without a controlled environment, we can't make those comparisons fairly. And what does that mean? It means if we don't pay attention to that, we're going to try to give resources to schools that don't need it, and also miss giving resources to schools that do need it.

Jill Anderson: Wow. This is really, really, really complicated on a whole other level that I don't think most people could even fathom.

Andrew Ho: Our job is to make it simple. Behind the scenes, it's complicated, but the job of state score reporting should be to make the complicated simple. And in order to do that, I'm hoping that you can just open up a score report and say, "Okay, we've got half of the usual kids that we used to have. Tell me the story of those kids. Are those kids doing better than expected or worse than expected, given what they scored a couple of years ago?" That's one story, and that's a really important story. That's what a lot of the research out there is telling right now. Story one, how are the kids for whom we have comparable scores doing compared to a couple of years ago? Okay, it's 50/50. Who are the two populations? They have their scores, how are they doing?

Now, here are all the kids who no longer have comparable data, who we don't have our eye on. Tell me the story of their scores two years ago. Are these the kids that needed the most help or are these the kids at the top of the distribution of whom might not have needed help? So, then you have the stories of these two populations and how different they are, and you can think about both. First, who are the results for the people who are actually in school? Second, what were the results for the people who are no longer in school?

And that I think will help us to say, "Okay, what's the census for this school?" Where are all the kids who two years ago, we thought would have still been here? And let's divide them in half, or divide them into their two respective populations and say, how are the kids doing for the kids who we have fair comparisons for and how were the kids doing for the population we don't have our eye on anymore?

Jill Anderson: So, I want to talk a little bit about the learning loss issue. In a lot of ways, all of these things, the test results and the mental health and the physically being in the building amongst some other measures, all will add up to what this "learning loss" picture maybe looks like or doesn't look like. But I want to talk a little bit about the term learning loss, because it sounds like even that is up for debate right now, whether that's the right terminology to be using.

Andrew Ho: I've seen some discussion about this. I don't want to spend too much time worried about what we call things, but I do think the conversation is important because the choice of what you call the term leads you to different solutions. So, the idea of loss as this permanent void that you can't do anything about would be damaging because it convinces us that we should triage that, there's no hope. And so, people if interpret loss that way, we definitely shouldn't call it that because it doesn't motivate us to do what we should be doing in order to make sure that we can make improvements and growth from this point.

So. If you choose to call it learning lag, if you choose to call it learning delayed, if you choose to call it an opportunity, disparity, whatever it is, it has to be framed in a way that motivates us to solve the problem, to invest resources, to do something about it. That's the reason why there's been a debate about it. Not because it's semantics, but because the choice of the word leads us to think about something as, A, being fatalistic, an impossible loss we can never recover from, or, B, something we should be energized to do something about. And I think as long as everyone listening to this appropriately treats this as something we can do something about, then we're in the right space, no matter what we call it.

I just hope then that the way we describe these disparities, that again, are solvable. These opportunities that we could be giving kids are not just in terms of reading and mathematics, but obviously in terms of all sorts of social experiences and skills, and not to mention, just outright health disparities that might be out there. Not to mention, the 500,000 people who we've lost, who are grandparents of kids in these schools, that is an incalculable loss. We have lost a lot. We can do things about it, and we have to measure those as accurately as we can while being honest about what we don't know.

Jill Anderson: When I hear you talk about it, it's really just a diagnostic tool to give you a picture of what's going on.

Andrew Ho: That's a good way to describe it. It is one panel or one piece of a puzzle. I do think though, if you want to talk about opportunities and hope for the future, I think this moment is an opportunity to reframe tests as a tool for support. I wouldn't call it diagnostic necessarily, unless we talk about it from the aggregate level. Diagnostic, it's like they can't tell what every kid needs, but we can tell if on average, which communities need the most supports. And that, instead of holding stick and threatening districts, this year is a real improvement. It's a real positive to use tests, I think, appropriately as monitoring tools and not horrific incentive structures that set up perverse incentives to cheat and teach to the test.

This opportunity to think of tests as the monitoring tools that they are is real progress that I hope lasts, because this to me is the appropriate role of assessment. As one piece of a multiple measures system that keeps an eye on a range of things we care about as we try to improve education and equity. So, I see this as a real opportunity, actually relocating tests in to where they should be in the firmament as, again, one high-level periodic tool that doesn't suck up all the air in the room, the way it usually does. That's where these tests should be, and I hope that's where they stay.

Jill Anderson: Andrew Ho is a Professor at the Harvard Graduate School of Education. He is a psychometrician focused on improving the design, use, and interpretation of test scores and education. I'm Jill Anderson. This is the Harvard EdCast produced by the Harvard Graduate School of Education. Thanks for listening.