Moving beyond grades will be key to rethinking assessment

How de-linking scales across domains would end the experience of one-big-competition

Amelia Peterson LSE & London Interdisciplinary School

Recently, Tom Cosgrove wrote for Rethinking Assessment about whether we need grades to learn. He makes a good case that we don’t. When we look at education from a societal or policy perspective, however, we might still think that we need grades for something else, such as selection or accountability.  

In this blog, I want to show that we don’t. In fact, we could have fairer selection processes and more effective accountability if we get rid of linked scales such as grades or percentages as our way of reporting assessment outcomes. 

I use the term ‘grades’ with some trepidation. Here in England, the status of qualifications is so intertwined with grades that we might have a gut reaction against any other way of reporting. In North America, while there is a substantial movement for ‘going gradeless’, it is often perceived as implying a relaxing of standards; as being opposed to rigour and demand in education.

In this blog, I want to show why it’s anything but. Moving beyond grades is not about making assessment any less demanding, but is about being able to make that demand more meaningful. 

Why does reporting matter?

We can think of assessment as having two parts: 

  1. making a judgment about an answer, product or performance
  2. converting that judgment into a score, mark or grade.

A lot of the time in assessment discussions we focus on the first step, about what those products or performances should be. But the impacts of an assessment depend equally on that second step: the decisions we make about how to report assessment results. 

Reporting commonly takes the forms of a scale, indicated in grades, marks, or scores. It might have particular cut-offs, such as a level that represents a ‘pass’, but it is designed to provide a comparative picture of performance. 

This is where the problems start. Create a scale, and you create a set of unequal incentives. Students who have more chance of scoring at the top of the scale have a stronger incentive: they have more reason to commit the effort to prepare for the assessment, and to try hard during it. Students who have little chance do not have very much incentive to learn for this assessment. This is true even before we consider the wider factors that differentially influence students’ motivation, such as their experience of school or their view of the opportunities beyond it.  

So most assessments create a small kind of inequality. They create stronger incentives for students who have the most chance of doing well. Those students have more reason to commit the effort to prepare for the assessment, and to try hard during it. 

The real problem with grades

The problem of unequal incentives is compounded when assessments use linked scales. Scales get linked through methods that make performance comparable across domains: a process called “commensuration”. Commensuration has consequences for how an assessment is experienced. Think about what happens when we hold the Olympics: suddenly a running performance and a gymnastics display that on another day would just mean winning a race or a meet both become a gold medal: it’s a lifetime achievement, a tally in a whole country’s medal hall. The stakes are raised. The competition becomes more competitive.

The same thing happens with GCSE grades. Who says that an A in Art and Design is equivalent to an A in Maths? Commensuration does. (There’s quite good work on the issue of comparability across subjects at GCSE and how it’s essentially impossible to really judge it). And as with the Olympics, because the competitions (subjects) are commensurable, the stakes get higher. Not creating a good Art piece or not studying for a Maths exam no longer just means a failure in Art or Maths, it means a bad grade; you’ve missed out on winning at school.  And it could mean a bad grade not just for you, as an individual, but for your teacher, your school, or your region.

This, I think, is the real problem with GCSEs. The assumption of using grades to make a scale is that everyone is taking part in one big competition altogether. But as I’ve written before, this doesn’t bear any relation to reality: there is no one set of opportunities that everyone is in the running for. (Those who went to “selective” universities might like to think that you beat out everyone else to get there, but this is simply not the case; most people didn’t try, and really very few people try for each subject).

The notion of fair competition is a mirage created by the scales we use – as is the idea that our educational values could ever be captured in a single dimension. In education and human development there are always trade-offs; a straight A student has lost out in other ways, we just haven’t measured them. 

What does this mean in practice?

I think we have to see the kind of generic grades we use at the moment as a problem. But more generally, any kind of commensuration poses similar risks. So just moving to a percentage scale or a set of levels or bands would not make things much better. 

We should be looking at models that try to de-link different subjects and/or or capabilities by using a range of domain-specific assessment and domain-specific reporting methods. In some systems, this is referred to as “mastery-based’ or “competency-based” systems, where students work to show mastery of particular “competencies”, or descriptors of things they can do. But many of these systems still then collapse results into percentages or grades for the purposes of reporting. The innovation we need is a way to really move beyond linked scales as an output of assessments. 

The first bit of this innovation is really possible already. There are many effective ways of capturing and portraying what someone can do in a particular domain. Portfolios; narratives; videos of demonstrations of learning; internship performance… The change would be to just focus on standardising assessment within a domain (and perhaps quite a granular domain – such as introductory statistics, rather than “Maths”, and reading, writing and literary analysis, rather than “English”), and then not to convert that into a linked scale.  

The big pushback to any of these methods is around comprehension: what about employers or universities who don’t have time to look at all these products? This is where advances in data visualisation should come into play. It should be perfectly possible to communicate precise information nowadays without linked scales, providing a range of information to external stakeholders, from which they can choose what to focus on. Stakeholders can then make different choices about what to focus on. For some it might be a component of Maths or English – have they studied statistics? How are they at reading complex texts? For others it might be something quite different: oracy, reliability, teamwork. 

A place to start

De-linking scales across domains would end the experience of one-big-competition, and also break down some of the simple bi-division between “academic” and “vocational” qualifications. And it would have other benefits: domain-specific ways of reporting performance and achievement could be more valid and more reliable, attuned to the kind of gradations, classifications or feedback that make sense for that domain. 

The consequence would be that each assessment should give more accurate information about what a student can actually do, and be more appropriately linked to the next steps that are relevant for that domain. Rather than students having the experience of taking part in one big competition, they would be taking part in a range of different, distinct competitions – most of which could feel more like a music exam or a driving test, where the stakes are domain-specific.  

Of course, no one move solves all our problems. There would still be big challenges related to the perception and understanding of different domains. De-linking scales is just one step, but in my view unless we do this, we will undermine the impact of any other change we can achieve.

Discuss this post