25 July 2011

Consistency between Music Examiners

Accidentally stumbling into that hornets' nest of candidates, teachers, parents and assorted relatives which is the waiting room at the start of any examining day, a music examiner will almost certainly be faced with the question; "Are you a generous or a mean examiner?"  My stock answer is "I'm a fair one", which cuts no ice with the assembled throng since they are not interested in fairness, just marks. 

There seems to be a general belief that music examiners set out from London with a bag full of marks which they distribute or retain entirely at their own whim. That the examiner has total jurisdiction over marking is simply not true, but that certainly is the perception and gives rise to the perennial question about consistency between examiners. I used to use the analogy of a German bank teller with ready access to thousands of Marks but having to account for every single one, but since the advent of the Euro the joke falls rather flat.

In graded exams, the ABRSM system does not entrust the examiner with any marks in the first place.  The candidate enters the exam room with 100 and effectively invests them in the subsequent performance.  If the investment is good, interest is earned (merit, distinction), if it is bad, a loss is incurred (failure).  The examiner merely oversees these investments and reports them back to London who check that the assessment of the candidate's portfolio has been consistent with the rules and regulations, and issues the relevant certificate.  Those rules and regulations are set out as a series of criteria, and from the comments the examiner makes it is clear what the correct amount of marks to be added or subtracted should be.  It allows for great consistency across the whole examiner panel, and at one of the last examiner meetings I attended before I parted company with the ABRSM in 2000, some 500 or so of us sat in the ballroom of a Park Lane hotel watching videos of examination candidates and discussing our marking.  One candidate, whom the Chief Examiner had decided deserved a total of 118, drew from the assembled throng marks ranging from 115 to 124; which was seen as a major crisis.  The ABRSM ideal was to ensure that across the whole panel of examiners marks not only stayed within the same category but did not vary by more than two.  Their methods and standardisation procedures have always been remarkably successful in achieving that goal, to the extent that one teacher I knew offered parents of his pupils their exam fee back if his estimate of their mark was out by more than two – and he rarely had to give back any money.

The Trinity system differs in that the individual examiners do have much more control over the marks, allowing up to 100 per candidate, but these need to be accounted for in the examiners' written commentaries which closely accord with the published criteria setting out the range of marks available.  Much more importantly, while the ABRSM offers a single mark for each piece played, Trinity breaks it down into three categories; a mark for getting the notes right, a mark for being able to control the instrument, and a mark for communicating the music to the examiner.  This system has the dual advantage of allowing candidates and teachers who either cannot read the examiner's scrawl or don't understand written English to see at a glance where the strengths and weaknesses lie, and also ring-fences the objective areas of assessment while reducing the number of marks available to purely subjective judgement.  This certainly makes for considerable transparency, although it does not achieve anything like the consistency of marking from examiner to examiner we get with the ABRSM system. 

But there again, is consistency important?  In an art form where consistency barely exists, why should examinations create an artificial need for it?  No two live performances are ever the same, recorded performances lack consistency – otherwise there would be no demand for ever-more recordings of the same repertoire – and even professional critics do not have consistency; read different reviews of the same performance and you will see immediately what I mean.  And the thing is, nobody worries.  That's music; different performances, different ears, different perceptions. Why should exams, which, after all, ought to reflect the real world of the subject, seek to impose something alien?

Of course for any examination to have broad legitimacy, there need to be universally accepted yardsticks, and these yardsticks require some kind of quantifiable elements which can be consistently recognised.  Music exams attempt to quantify the unquantifiable, but I doubt that consistency for consistency's sake is really useful in this context. 

When I was an ABRSM examiner I knew colleagues who were so terrified of the Chief Examiner's "moderation" (the equivalent of the "re-education" practised by totalitarian regimes) that they did not so much assess the candidate in the room as write a report which London would recognise as being consistent with the marks awarded.  The first thing I noticed when I switched from ABRSM to Trinity was that the focus was on the candidate rather than on the consistency of the result.  So, while with ABRSM an examiner heard all of one teacher's pupils together and was able to recognise immediately his individual consistency (and in London they were able to see from the teacher's previous results that the examiner was consistent with previous examiners), Trinity examiners are never told who the teacher is, nor which candidates are taught by which teachers, so on a daily basis see a very divergent series of results from which they cannot begin to infer a personal consistency; and as Trinity is differently constituted from the ABRSM, it never knows the teachers either so cannot easily identify an individual examienr's consistency against the global pattern for that teacher.  The guarantee each Trinity candidate gets, however, is that the examiner assesses them as individuals, and not in comparison with others from the same teacher or even, as can be the case, by preconceptions prompted by knowing the teacher's name or being intimidated by the teacher's long list of qualifications. 

That's how it is with graded examinations.  But performance diplomas present a very different picture when it comes to consistency between examiners.  A recent comment on my post about Balanced Diploma Programmes asked me whether my views were shared across the panel and whether, while I may think one thing, another examiner might think differently.  Having no first-hand experience of the ABRSM diplomas, I can only talk about Trinity and here, again, we have a useful level of transparency by breaking up the marks between the purely objective elements (accuracy, observance of detail in the score) and the subjective ones (communication, interpretation).  I might dislike a particular way of playing a Chopin Mazurka and mark it down accordingly, but I still have to give, say, full marks for accuracy and attention to detail.  As a result my mark may be lower than a colleague whose feelings about the Chopin are more receptive, but it will not substantially affect the global mark.

On top of that, I may feel that a particular programme choice is unbalanced, being based purely on historical periods rather than style or character.  But I can only allow this to be reflected in the 10 marks (out of 100) I am allowed to issue in that section of the diploma labelled "Presentation Skills".  And as that mark also involves assessment of programme notes and stagecraft, in reality I have no more than three marks at my disposal to retain because I don't think the programme choice is balanced.  Along comes a colleague, old, doddery and out of touch with reality (there are some) who still thinks that balance only refers to historical styles, and seeing Bach, Haydn, Chopin and Debussy, salivates with satisfaction.  But he can only give three marks, so the difference between us, while it might be a little more than the two of the ABRSM, is hardly going to be a matter of great concern to those anxious about consistency.

With 200-300 words to write in a diploma exam report (the length, it must be said, of an average newspaper music review) the comments from different examiners on the same performance may well differ radically both in content and in the hierarchy of assessing the constituent elements.  That has to be a good thing, for it reflects the reality of the musical world.  But when it comes to the marking, the criteria and the transparency of the Trinity system ensures that there is sufficient measure of consistency between examiners to legitimise the result and the system. 

2 comments:

  1. Dr Marc
    A fascinating account of how the different marking systems are designed and how they operate. Many thanks for taking the time to explain and discuss this. 

    I was intrigued by your question of whether consistency is a good thing. Having a scientific background, I have always taken it that everything, no mater how subjective, may be quantified and represented with a suitable choice of objective measures. But your image of a hall full of ABRSM examiners learning to recognise what 118 looks like does illustrate how standardised consistency can move the focus away from the individual, and from the holistic to the measurable. And the question of subjectivity also connects with your earlier discussion on whether there is universal recognition of good composers, as distinct from personal preference. 

    Very interesting, thanks.
    Dr Peter 

    ReplyDelete
  2. Fascinating to read. In Australia, Examiners' marks are supposed to fall within the margin of 3% either way; so in Park Lane, there wouldn't have been such an outcry unless the marks had been below 115 or above 121.
    It's a never-ending discussion really; each examiner's perception of the same candidate and how they perform is slightly different. Being a subjective art form, it's an unrealistic task for examining bodies to expect examiners to resort to intuition to know what the 'perfect' mark for each candidate is and to match one another in the region of 2%. There is no absolute truth for a solution here either. I think that the ABRSM should accept this for what it is and seek to improve things for the examiner instead. Holding up an impossible expectation as a serious issue will invite frustration and undermine confidence. Being subjective, it cannot be wholly quantified. Anxiety would be a symptom of an incompleted system. We can use our judgement for subjective areas in exams. We have the knowledge and experience. We can only be consistent with regards to the measurable qualities. Integrity of the examiner should be upheld.

    Kind regards,

    Catherine.

    ReplyDelete