Note: there are two blogs which follow this one which offer some solutions to the problems outlined here. A few days ago, Ofqual published an interesting blog looking at the state of the examinations system. This was based on an earlier report exploring the reliability of marking in reformed qualifications. Tucked away at the end of this blog was the startling claim that in History and English, the probability of markers agreeing with their principal examiner on a final grade was only just over 55%. The research conducted by Ofqual investigated the accuracy marking by looking at over 16 million "marking instances" at GCSE, AS and A Level. The researchers looked at the extent to which markers’ marks deviated from seeded examples assessed by principal and senior examiners. The mark given by the senior examiner on an item was termed the “definitive mark.” The accuracy of the other markers was established by comparing their marks to this “definitive mark.” For instance, it was found that the probability that markers in maths would agree with the “definitive mark” of the senior examiners was around 94% on average. Pretty good. They also went on to calculate the extent to which markers were likely to agree with the “definitive grade” awarded by the principal examiners (by calculation) based on a full question set. Again, this was discussed in terms of the probability of agreement. This was also high for Maths. However, as noted, in History and English, the levels of agreement on grades fell below 60%. When Michael Gove set about his reforms of the exam system in 2011, there was a drive to make both GCSE and A Level comparable with the “the world’s most rigorous”. Much was made of the processes for making the system of GCSE and A Level examination more demanding to inspire more confidence from the business and university sectors which seemed to have lost faith in them. Out went coursework and in came longer and more content heavy exams. There was a sense of returning GCSE and A Level examinations to their status as the "gold standard" of assessment. The research conducted by Ofqual seems suggest that examinations are a long way from such a standard. Indeed, it raises the question of whether or not national examinations have never really been the gold standard of assessment they have been purported to be. Have we been living in a gilded age of national examinations? The answer is complex. Before I launch into this, I should also note that I understand the process of examining is a difficult one and that I have no doubt those involved in the examinations system have the best interests of students at heart. I also don’t want to undermine the efforts of those students who have worked hard for such exams. That said, there were some fairly significant findings in the Ofqual research which need further thought. A gold standard? As noted, the Maths GCSE gives a good example of how marking should operate when it is really effective and rigorous. The chart below is taken from the Ofqual report (annotated by me) and gives a nice illustration of something approaching a “gold standard”. Here the probability of the student (at almost any point on the mark range represented on the x axis) receiving the “definitive grade” is effectively 100% (seen on the y axis). This represents theoretical complete agreement between the marker and principal examiner. The only exceptions to this are near grade boundaries where the small variations in marking standardisation mean that the probability of getting the grade right reduces to just under 50%. This is exactly what one would expect. Once the boundary is cleared, the agreement rate returns quickly to maximum. The low level of variation is closely linked to the fact that most mathematics papers have definitive answers and usually low-marks questions, on which there is likely to be agreement. The picture is similar in other subjects too with the sciences (and interestingly languages too) sitting at 80% or greater agreement rate on the definitive grade. Just below these we find Psychology and Economics (70-80% agreement). Again, this is probably because there are large numbers of undisputed answers and short questions. Even where there are longer answers e.g. “Explain what testing must be done before this new drug can be used to treat people” or "Calculate the area of the pond. Show your working." , there is going to be a sense of what constitutes a good answer, because there is likely to be high levels of agreement on the required knowledge. For instance, the working steps to calculate the area of a circle are unlikely to change much between schools, so method marks here would be easy to spot. Something more base... If we move away from Maths and turn to AS History however, we see a starkly different picture. Apart from outlying examples at the top and bottom of the mark range, the bulk of the assessment sees limited agreement between the markers and the principal examiners. The probability of markers getting the same “definitive grade” as the senior examiners sits at just 61% on average, seldom rising much above 62% even in the middle of a grade. There is also a long tail off towards the grade boundaries. In essence then the reliability of such marking for determining grades has to be questioned and explained. This is something Ofqual attempt to some extent in the rest of the report. First it is important to note that the level of variation is likely connected to the fact that History almost exclusively includes essay style questions. These of course lead to legitimate disagreements over what is a plausible mark. However, these figures illustrate not agreement on the exact mark, but the probability of getting the “definitive grade”, a broader measure. A larger degree of difference may come from the fact that there is no definitive list of what should be in an essay question. A student answering the prompt: "To what extent were military reverses responsible for the downfall of Nicholas II?" would likely address military reverses, but could choose from a wide array of other factors, few of which will have been directly specified by the exam board ("opposition and the collapse of autocracy"). Equally there is not even a definitive list of which military reverses a student should discuss.This is before we factor in the variances of how teachers might have interpreted the specification item itself - should the focus be on all opposition? or on the connections between the opposition and the collapse? Or the nature of the collapse and its aftermath? And so on. Because most teachers are not party to the discussions which happen during specification creation, they are almost always guessing what they might best teach and therefore students may answer similar questions in very different ways. In short, the nature of history as a subject of great breadth makes the job of the marker extremely difficult. And of course markers do not need to have a specialism in the topic they are marking - another level of difficulty. History is not alone here either. The probability of agreement on a definitive grade sat between 60% and 70% for RE, Geography, Computer Science, Business Studies, and Sociology. Meanwhile in English and History saw agreement rates as low as between 50% and 60%. Why does it matter?
The big question is does any of this matter. I for one think it does. My first port of call for this is Terry Haydn’s excellent work on assessment in which he noted that the first principle of testing should be “first do no harm”. We have already seen the disproportionate impact of examinations on curriculum (something Ofsted is finally tackling) and increasingly on students’ mental health. The fact that the marking system is so deeply flawed does not add great confidence. Indeed, I spent many years as a teacher fighting to have the efforts of history students properly recognised in this system. A quick check back reveals that of the 43 scripts I sent for re-marking between 2011 and 2013: 22 increased their mark (by an average of 10 marks - a grade and a bit); none went down; five went up by 15 marks; and two went up over 20 marks (3 grades). None of these were transposition errors. What worries me more is that Ofqual’s summary of the findings suggests little appetite for real change and an under-appreciation of the scale of the problem. For instance, the report noted that in all subjects apart from English, Sociology, Geography and History, the chance of students receiving the “definitive grade” +/-1 was 100%. However, this is still a large variation when students’ futures and school reputations rest on the accuracy of such grading. Worse still, for English and History, the chance of students receiving the “definitive grade” +/-1 only came out at around 96%. This implies that 4% of students were 2 grades or more from the definitive grade. This could mean a child in GCSE English gaining a Grade 3 rather than a Grade 5. Of course, Ofqual also noted that there is a system to review marks in place. This is true, but at over £40 a review, a student wishing to confirm their marks in the four or five problematic subjects noted would be shelling out £200 on results day! Ofqual’s response also did much to downplay the issues. For example, Ofqual note in the report that variations in marker accuracy have remained fairly similar over the last five years. To me this is cold comfort and just reinforces that the system has been failing in many regards for a long time. Ofqual also note that we sit broadly in line with other countries when comparing the accuracy with which “6 mark” questions are graded. Again, there is not a lot of solace here. Being comparably problematic feels a little complacent at best. And of course, other jurisdictions do not necessarily place such a high focus on graded outcomes at 16 and 18 for either pupils or schools, so it is not really comparing like with like. The only radical suggestion made for change was that it might be possible to do away with the reporting of grades and replace this with reporting a mark and confidence interval. This would certainly provide more information, but I am not sure it would be especially helpful for a university to know that there was 60% certainty that a pupil had scored 65% on their history exams. Indeed, most of the hope for change rested on a single suggestion that there might be “state-of-the-art techniques and training” brought in to support standardisation in essay-based subjects. Whilst this all sound very impressive, to me it smacks somewhat of the high-tech Irish border solution we keep being promised by the Brexit lobby: it only exists in fevered imaginations or fiction. For the last 30 years and more, national examinations have been held up as an entirely vital, dispassionate, and fair assessment of pupils’ abilities, as well as of schools. Whilst there might be some case for arguing such in Maths, I am not sure we can say the same in History or English. The exam system at its core is rusty base metal rather than gold. All Gove’s reforms have seemingly done is add another glittery coating to keep up the pretence that it is otherwise. The nettle which might need to be grasped is that our system of assessments needs more fundamental change. This is something I hope to discuss further in my next blog.
9 Comments
4/16/2021 07:55:09 am
Good day readers! I would like to say that nowadays there are a lot of "so-called" writing agencies which can help! Individually for me there is only one company which I can trust. I got acquainted with this one by accident on the web and after that I heared from my school friends about this online writing service's site. I have used it for a long time and their super company has never let me down in doing the best essays. You can use their online professional web company and buy your assignments wright now.
Reply
7/20/2021 02:53:31 pm
Such a unique approach. I am happy to read detailed posts on the examination requirements and criteria as the ecommerce web development students find it challenging to appear in the test without proper guidance. These types of helpful writing are indeed making a difference in life more than anything. Keep it up, and don't stop the good work.
Reply
11/27/2021 08:22:13 am
Exams are always headache for students, it is stressful act for any student who seek assistance for getting high ranks in Academia
Reply
12/7/2021 10:59:28 am
An innovative approach to keep the pace of students on point! I think all the institutes of UK should adopt this examination approach in order encourage all those students who are born with business mindset.
Reply
9/12/2022 12:28:07 pm
when you already exercise regularly, as previously recommended. In fact, if you are new to bodybuilding or have stopped exercising for a while, it is ideal to start using supplements after you have been working out consistently for at least 4-6 weeks.
Reply
9/12/2022 12:29:56 pm
when you already engage in regular activity, as previously advised. In fact, it is best to start using supplements once you have been working out consistently for at least 4-6 weeks if you are new to bodybuilding or have stopped working out for a while.
Reply
12/8/2022 10:26:27 am
If you need help with any type of assignment services do visit us as we are providing the best services out there with discounts up to 50% off.
Reply
5/25/2023 07:38:09 am
The two most important components of any room are the basement and the roof. They are vulnerable to water damage and other roof-related problems, though. A wet basement or a leaky roof can have serious repercussions, harming your house and possibly posing health risks. You must hire a qualified helper with experience who is also professional if you want to maintain your home safe from any leaky basements. This extensive tutorial will examine the major elements of professional roofing and basement waterproofing services.
Reply
Leave a Reply. |
Image (c) LiamGM (2024) File: Bayeux Tapestry - Motte Castle Dinan.jpg - Wikimedia Commons
Archives
August 2024
Categories
All
|