In chapter 7, Christodoulou sets out to suggest some practical approaches to formative assessment. She makes some useful points about the potential benefits of multiple choice questions and offers some helpful strategies to increase the rigour of these as an assessment tool. Whilst Christodoulou does make a good case for how multiple choice questions might be used productively in assessing history, I am not sure she does as much to balance this with their many limitations – especially when trying to build students towards tackling complex topics. Attempts to assess history completely through MC questions have already been made in the USA and have met with limited success, especially in helping students to transition towards deeper historical thinking. This is something which the Stanford SHEG project was set up to tackle. Whilst this approach to multiple choice questioning does have some advantages, it ironically downplays the role of knowledge in making some of the judgments and leads precisely back to that idea of over-testing which Christoudoulou criticised earlier in the book. That said, I do wish more history teachers would make careful use of MC.
A good case is then made for frequent and repeated low-stakes testing in the classroom. Again, some examples of this working would have been helpful. Even then, as Dennis pointed out in his article in Teaching History 164, the results of low stakes testing are still tied very much to a Hisrschean interpretation of curriculum and need to be assessed in the context of other curriculum aims too.
All of this aside, this feels like a useful chapter for trainee teachers to read.
Chapter 8: Improving summative assessments
Christodoulou returns at the opening of this chapter to a criticism of marking rubrics and descriptors. She notes the perverse incentives a rubric can create and the major issues with their reliability. Again, this is based on an assumption that our ultimate goal is to compare children across the board (in itself debatable). However, she raises some valid concerns which have plagued GCSE and A Level examinations for years (as well as earlier assessments).
It was useful to see a clear explanation of the comparative judgment approach. This is something which may end up having real merit as we move more and more towards digital working. That said, comparative judgment relies upon a system of norm referencing in which there will always be winners and losers. There is no scenario in the use of comparative judgment where everyone can succeed, even if they meet what might be considered base criteria. Therefore, in a comparative judgement assessment on how to make a cup of tea, there would still be 50% of people falling below the mean, even if everyone was capable of the task.
I also feel the idea of where “tacit knowledge” comes from is somewhat under-explored. Presumably this is based on a kind of internal rubric which is not standardisable. In order to mitigate for this, comparative judgments would need to be made by huge numbers of people. This might be possible at a national level, but possibly not in school. Equally it implies that a piece of work may receive an B grade because 10 people think it is a B grade. However, it might also get a B grade is 5 people thought it was worth a C and 5 an A. This is simplified but the essential point is a valid one I think.
The final section on grading was quite a useful introduction to the conversion of pupil scores into grades and would make a good read for trainees again.
Chapter 9: An integrated assessment system
Chapter 9 contains some interesting reflections on how an integrated system of assessment and curriculum might be created. All of this relies on a set and standardised national curriculum however (which has its own issues). In some ways, one might argue that such a system partially exists where schools have bought into GCSE textbook schemes which come with diagnostic tests, exam papers, and allow teachers to access a national bank of resources. Still, there are some interesting implications here for how curriculum might be controlled an homogenised – whether this is a good thing, is another matter.