Illustrations: Nate Williams
The Problem with Grading
When it comes to how we show what students know, do traditional grading practices deserve an F?
My son’s binder was a mess. Loose papers were falling out, others looked like they had been balled up or stepped on, some more than once. The binder itself was bent in one corner. But he was a seventh-grader and to him, it looked just fine.
Unfortunately, his seventh-grade math teacher didn’t agree and deducted points from his grade for being messy. This same teacher also took off points when homework was completed with something other than a pencil or if a student needed a second copy of an assignment. If a student was asked to move their seat during class, she slashed five points. Points were earned back if a parent signed the list of rules, and it was returned in a timely manner.
Being organized and not misbehaving in class are skills students need to figure out, for sure, and I certainly wanted my son to be neater, but factoring these behaviors into grades — especially for middle-schoolers just learning to come into their own — didn’t make sense to me.
And so, when I learned, a few years later, that my son’s high school was rethinking their grading practice, I decided it was time
to dig deeper into what Grading for Equity author Joe Feldman, Ed.M.’93, calls “one of the most challenging and emotionally charged conversations in today’s schools.”
Grades Are What?
I started by asking a question that seems simple on the surface: What is a grade?
Feldman, a former teacher and principal, says that on a really basic level, grades are the way teachers calculate and report student performances. Typically, it’s an accumulation of points (0 to 100) with corresponding letters (A through F, minus E). Earn an 89 on a test and your grade is a B+, for example. Believed to date back to 1785, when Yale President Ezra Stiles gave four grades to his seniors (optimi, second optimi, inferiors, and pejores), grades have long been a part of our education system in the United States. In fact, Feldman says, grades have become “the main criteria in nearly every decision that schools make about students,” from whether they get promoted to the next class or held back, to which course level a student should be taking, such as college prep, honors, or AP. It’s how many high schools tally GPA and student rank, and one of the main ways that colleges decide who they’ll even consider for admissions.
“Grading is evaluation, putting a value on something,” says Denise Pope, Ed.M.’89, a senior lecturer at Stanford who runs a project called Challenge Success. Pope stresses, however, that grades are not the same as assessment, and to really talk about grading, we have to make the distinction between the two terms.
“Assessment is feedback so that students can learn,” Pope says. “It’s helping them see where they are and helping them move toward a point of greater understanding or mastery. Grading doesn’t always do that, but assessment should.”
When she hosts professional development workshops to help schools rethink their assessment practices, she likes to point out that the Latin root of assessment is sari, which means to sit beside. Assessment is seeing where a student is with their understanding — what they don’t know, what they do know — and then using that to determine what they need. “Sometimes a grade does that,” Pope says, “but a lot of times students have no idea what that grade means.”
And that’s what seems to be at the heart of the debate about grading, and what rubbed me the wrong way when my son was in that math class: Students, teachers, parents, and college admissions officers have no idea what a letter grade — this thing we are saying is really important in a student’s school life — is really saying. Does an A mean a student has truly mastered that history lesson? Does the C+ mean the student was “sort of” getting the math they were learning, or did it mean they were an ace at math, but just couldn’t keep a neat binder?
What’s the Problem?
The confusion starts with consistency, as in, there is none. At most schools, there’s no consistency about what’s included in a grade or what’s left out, even among teachers teaching the same subject in the same school to students in the same grade at the same level. This creates what is often called “grade fog” — we’re not sure what the grade means because we’re asking that A or that C+ to communicate too much disparate information.
“It’s radically inconsistent from teacher to teacher,” says A.J. Stitch, Ed.M.’12, the founding principal of the Greater Dayton School, a private school in Ohio for kids from low-income backgrounds that doesn’t use traditional grades. At public schools where he has worked in the past, he says “most teachers had different approaches to weighting homework, classwork, quizzes, and tests.”
For example, he says, “a student may demonstrate mastery of content on a test, quiz, and classwork, yet still fails a course because the teacher decides to weigh homework 40%, and the student, for one reason or another, struggles in that regard. Obviously, that’s inequitable, and it illustrates the variation of weighted grade scales and how it impacts a student’s success or failure, regardless of whether they mastered the standards taught in the course. Sadly, I made this mistake myself as a young teacher, and as a principal I’ve seen too many teachers make this mistake, too.”
Jason Merrill, the principal of Melrose High School, where my son currently goes to school, says this is one of the biggest reasons they started looking at their teaching and learning practices, and why they applied to become one of five schools in the multi-year Rethinking Grading Pilot program sponsored by the Massachusetts Department of Elementary and Secondary Education.
“Your son has eight teachers right now that all have their own way to grade. Completely their own,” he says. “The average kid often gives up trying to figure it out. Some teachers count homework, some teachers don’t. Some teachers grade homework, some teachers grade it as completion. Some teachers count large tests for a lot more than others. What we want to do is not have 85 different ways to respond to a fire alarm.”
Feldman says we also don’t want to include non-academics in grades — things like messy binders and not coming to class with a pencil, or the one that is commonly factored in: late work.
“A student who writes an A-quality essay but hands it in late gets her writing downgraded to a B, and the student who writes a B-quality essay turned in by the deadline receives a B. There’s nothing to distinguish those two B grades, although those students have very different levels of content mastery,” he says.
Traditional grading also invites biases, he says, especially around behavior. “When we include a student’s behavior in a grade, we’re imposing on all of our students a narrow idea of what a ‘successful’ student is,” Feldman says, and “you start to misrepresent and warp the accuracy.” For example, a student who participates in discussions and always brings their pencil to class earns five points, but they get a C on the test. Adding the five behavior points lifts that C test grade to something in the low B range. Although students and parents are happy the grade is a B and that the student’s all-important GPA remains intact, this warping can create longer term problems.
“You’re telling the student that they’re at a B level in content, and they’re actually at a C,” Feldman says. “They don’t think there’s a problem, the counselors don’t think there’s a problem, and the student goes to the next grade level and gets crushed by the content. They had no idea that they weren’t prepared for the rigor of that class because they kept getting the message that they were getting B’s.”
It can be especially confusing for parents, says Christopher Beaver, one of the assistant principals at Melrose High. “I knew what my own kids could do skill-based wise, but if I’m a parent and I don’t know what my kids can do because the teachers haven’t laid that out for me on a report card, then I can’t look at a report card and say, ‘See that. My kid is proficient at this skill or my kid is proficient at that skill,’” he says. “I’m going to focus on something like the GPA because that’s all I have. And I’m going to assume, if my kid has a high GPA, that my kid’s skillset is at a proficient level. But that is not always the case.”
As a parent, I was confused earlier this year when my son’s overall grade in a class was low, even though he seemed to get the content. We looked online at the grading portal the district uses and sure enough, he had Bs and As. But then there was that one grade: a 44 on a test he didn’t have enough time to finish. That one low test score brought the whole grade down because of another impossible part of how we grade: averaging.
“We have this ridiculous system of averaging things out,” Pope says, “which doesn’t make any sense because the goal is to get students to learn material. Same with the case against zero, right? Why would you give a kid a zero? A zero is worse than an F.”
The “case against zero” idea is that when using a 0-to-100-point scale in grading, a student should never receive a zero, even if they didn’t turn in an assignment. Sounds odd, given that a zero for not turning in work is how we’ve long operated, but as author Doug Reeves wrote in 2004 in “The Case Against the Zero” in Phi Delta Kappan, “assigning a zero is disproportionate punishment.”
Why? Because mathematically, with a 0-to-100 scale, failing a class is more likely than passing a class. Think about it. Each letter grade is 10 points — an A is 90-100, a B is 80- 89, a C is 70-79, and a D is 60-69 — but the scale’s one failing grade, an F, spans not 10 points, but 60 (0 to 59). The result is that a zero disproportionally pulls down an average and makes it that much harder to pull a grade up significantly. A student with two 85s, for example, is averaging a B. If that student gets a 0 on one assignment, their average drops to 56, an F. Even if the student gets 85s on the next two assignments, their average still only jumps to a 68. So, four Bs and one zero means the student’s averaged overall grade is a D+.
This averaging especially penalizes students who start out a semester slower with lower grades. Even if they figure out the material and fully master content later, averaging won’t necessarily reflect what they truly know. In his book, Feldman gives an example of a student who, coming into ninth grade, had never learned to write a persuasive essay. The ninth-grade teacher gives an assignment early in September, revealing this student’s writing inexperience.
“The essay gets a D-. But it’s early in September, and you, as the teacher, provide instruction and guided practice with feedback,” Feldman writes. The student’s writing improves, and their grade goes up with each new assignment. The student eventually learns how to write an amazing persuasive essay. They are doing A work. However, when the grades are averaged, that early D- drags down the overall grade and though the student mastered persuasive writing, their A drops to a B-.
Add Stress to the Mix
Beyond the problems with how we grade or what a grade means, Robin Loewald, Ed.M.’19, an English teacher at Melrose High, also worries about the effect grades have on student mindset, especially for middle- and high-schoolers.
“Grading in general is tough because of the expectations for students with college applications,” she says. “There tends to be a lot of stress around grades and the minute difference between a 93 and 94. In truth, it’s hard to really delineate the difference between those two numbers in terms of student understanding and mastery of the subject.”
Pope focuses her work extensively on the stress students take on trying to chase “good” grades and the extrinsic motivation — driven by external rewards — that takes over. In an op-ed she co-authored in February for The Hechinger Report about the furor over ChatGPT, she wrote that instead of asking how to stop students from cheating using bot programs, we should instead be asking “why” students are cheating in the first place. Chasing those good grades is part of that “why.”
“We have this real system of you need to get the grades and the test scores in order to please your parents, go to college, get the merit scholarship, get a good job — whatever it is,” she says. “There’s this extrinsic motivation that’s tied to grades, which adds to student stress, and in some cases can lead to really unhealthy practices like perfectionism or great anxiety, paralysis. And it could also really turn kids off. ‘Well, I got a C so I’m bad at math. I’m not a math person so clearly, I shouldn’t try anymore.’”
As Feldman said during an interview in 2019 with the Harvard EdCast, for students, even attempting to follow the range of grading practices each of their six or seven teachers follows can be stressful.
“For the student, it adds to my cognitive load,” he says. “I not only have to understand the content and try and perform at high levels of the content, but now I also have to navigate a grading structure that may not be totally transparent, and may be different for every teacher, and particularly for students who are historically underserved and have less education background and fewer resources and understanding of how to navigate those really foreign systems. It places those additional burdens on them, which we shouldn’t do.”
Are There Alternatives?
If traditional grades say little about a student’s mastery of the material, are often inequitable, and can add more stress, what are better ways for teachers and schools to capture a student’s skills and understanding of the material? And given the long history of using numbers and letter grades, are schools even ready to change?
Back in 2005, Chester Finn Jr., M.A.T’67, Ed.D.’70, then president of the Washington-based Thomas B. Fordham Foundation, told The Washington Post that “high schools will keep using them if college admissions offices keep requiring them, which they likely will.”
But nearly two decades since Finn made that observation, it’s clear that some schools, like my son’s, are ready for change and have ideas on how to do that.
At the Greater Dayton School, Stitch says their ability to work outside the structure and limitations of a public school gave them the liberty to design whatever grading scale they thought was best for kids. They chose not to use the A to F scale.
“The traditional grading system is not aligned to learning outcomes,” he says. “Traditional grading is one-and-done in terms of you’ve learned the content, or you haven’t, and the grade you get is the grade you get. A better grading system allows for multiple attempts of content mastery.”
Which is why his school uses only two grades — “mastered” and “in progress,” and students have unlimited chances to learn the material and become proficient, he says. Students also learn at their own pace and the school’s standards are broken into kid friendly “I can” statements so parents and students know exactly what skills a student “can” do and which skills they are working on.
A few years ago, Melrose High started allowing students to redo their work if the grade was below a certain number. The idea was that learning shouldn’t be punitive — it was about mastering content, even if that took more than one try.
As Merrill says, “At the end of the day, we want all kids to learn. We don’t want to prove that they don’t know something. We want to be like, you need to do some work to retake this again to show us that you do know it.”
Loewald says the school’s English department additionally has an extended revision policy around writing assignments, where students can meet with their teachers to edit, revise, and resubmit their writing work. She allows students to revise almost every assignment.
“I think that the process of learning through revision is really helpful and allows there to be less pressure on the initial submission of work,” she says. “Students are graded on rubrics and can use those rubrics to guide their revisions of assignments. The only assignments that I do not allow students to revise are their reading checks since those are things we talk about and reference in the class in which they’re due.”
Merrill says the school’s revision policy is a work in progress — it needs its own revision — because there is currently too much variation in what students can redo. “We are working to build a single, consistent retake policy. If we de-emphasize the weighting for formative assessment and practice materials, such as homework and classwork, then we can have a retake policy that addresses summative assessments only,” he says.
Caitlin Reilly, Ed.M.’14, recently started as a deputy principal at Revere High School, located just north of Boston and part of the state’s Rethinking Grading Pilot. She says the school is moving toward a full competency-based model. Although there’s variation on how competency-based is defined, it generally means that instead of evaluating students as proficient based on the amount of time they spend on a subject — 58 minutes for factoring polynomials or three years taking a foreign language — time allotment is shifted to how well students can define what they actually know about a subject. And those competencies aren’t vague — they’re clearly spelled out by a school.
“For us, competency learning is a matter of equity for students because it makes apparent to all students, what are you working toward?” says Reilly. “Where do you not yet have the skills? What support do you need? And students should be seeing their progress to the standards of the course. Knowing that is incredibly important for all students, versus the hidden game of school when you have this letter grade, and you don’t know where it’s generated from, or you have a test that you got 10 points just for writing your name.”
One of the areas Revere High is working on with the grant, she says, is rethinking report cards. Their current approach mimics, in some ways, what elementary schools typically do, which is to include comments about student strengths or areas that need improving in their habits-of-work, not just the letter grade. They are working on transitioning course grades from a single letter to a report of proficiency on course competencies.
“Our current report card is a one-pager that has letter grades … but for every class students have, there’s a habits-of-work box that includes the four habits-of-work that we assess: active learning, respect, collaboration, and ownership,” she says. For each habit, there’s a scale of proficient, some proficiency, or not yet proficient, with rubric-defined criteria that guides the understanding of what it is to be proficient in each category.” In that way, it’s not just a teacher’s general “sense” of which category to pick or a parent’s guess as to what each habit actually means.
As I talked to educators about other ways to rethink how we grade, some suggested dropping the lowest grade in a class or not grading assignments done early in a semester. Many mention not grading homework but instead allowing that work to be a place where students can figure things out and make mistakes, especially when new concepts are introduced. Others talk about doing away with the 0-to-100 scale. In Melrose, Loewald says the English Department has already shifted to a 1-to-4 scale.
“A four meaning the student is exceeding expectations, three is meeting, two is approaching, and one is developing,” she says. “It’s much more accurate in terms of assessing student learning to use a smaller scale.”
Feldman says that with any change around such an entrenched topic like grading, “We are learning that you actually have to invest in teacher understanding along with policy development in order to change practice around grading.”
It’s something my son’s school has already jumped on with a core group of administrators and teachers examining current practices and testing out some of the changes they want to make.
“They’ve all set goals for themselves and are participating in regular coaching,” says Melanie Acevedo, the district’s director of instructional technology and personalized learning. “They come to a meeting once a month and talk about what’s working, what’s not working. They are a group that’s trying things out. They’re being the people that are booted on the ground, really experimenting so that we can come back to the bigger faculty and say, here are some things that people have tried. Do you want to try that? We’re building this idea from the staff and from the teachers because they’re the ones that know best.”
One of the things Melrose High isn’t doing, at least not yet, is blowing up the entire grading system or even doing away with traditional A to F grades.
Instead, says Merrill, they’ve set a goal so that by next fall they have “a very clear, consistent, transparent grading practice and policy in place for all teachers,” he says, and can answer questions like: How do we assess kids? How do we communicate that? How do kids know where they stand? How do they reflect and retake or do revisions? How do we count homework? Is that grading equitable? “There are so many pieces that go into it,” he says, “but we’re not looking to make any of our kids a trial.”
Luckily, there’s broader interest in “rethinking grading,” as the Massachusetts pilot is called. Sales for Feldman’s Grading for Equity book are robust enough that he’s working on a second, updated edition, and, he says, “I am not any less confident that this is one of the most important levers that schools and districts can use to not only improve student achievement, but also reduce achievement and opportunity disparities.”
Rethinking grading may even keep some teachers in the profession longer.
“We’ve heard, and we have some data, that this work actually increases the likelihood that some teachers would stay in their district,” Feldman says. “We see a real crisis in the retention of the teaching force. Knowing that there’s a learning opportunity that can engage them more directly with why they went into teaching in the first place, and gets them more excited about teaching, I think is really important.” Teachers, he says, don’t want to be the bean counters or police officers they often become when it comes to grading.
“The five participation points every day. The, you turned it in late one day, so you lose 10% or you turned it in two days late so 20%,” he says. “None of us went into teaching to do that.”