Readers of my blog know that I generally regard multiple choice tests (MCTs) as an adequate tool to assess student knowledge of, and proficiency with, a given set of topics. I have written about this subject here and here.
No, I do not think that MCTs are perfect, nor do I deem them necessarily the best testing methodology all the time, for any subject at any level  — simply that for some subjects (e.g., introductory physics) they have their place, and often times offer distinct advantages over other procedures.
Obviously, much like it is possible for instructors to inflict upon their students a confusing, unfair or otherwise poorly designed problem-based test, it is equally possible to botch a MCT. There are several pitfalls of which a conscientious instructor should be aware.
The one pitfall that I would like to discuss in this post, is the intrinsic element of randomness of MCTs, namely the possibility that even an unprepared (in fact, in principle utterly clueless) student will pick the right answers to a few questions, just by chance. So, how much can this in practice affect the outcome of the test, by unfairly rewarding and mis-diagnosing the proficiency of some (possibly many) lucky individuals ?
Case in point
Over the past few years I have taught large-enrolment (four hundred students) introductory physics courses, and made extensive use of MCTs. My typical final exam will last two hours, and consist of twenty questions, each with five answers, one being the correct one. Students have to pick one of the five. Ideally, an instructor would want a passing grade to go to a student able to pick at least a half of the right answers, i.e., scoring at least 50%. Of course, in actuality a test goes the way it goes; sometimes the instructor will fail to calibrate the test properly, and students will not perform as well as expected. In those situations, one has to worry about the possible “contaminating” effect of correct answers picked by accident.
In order to establish the above contention more quantitatively, suppose, for the sake of argument, that I gave out a completely crazy test, and that the four hundred students writing it had absolutely no clue as to what it is all about. In other words, they are unable to select one of the five answers to any of the twenty questions, based on any criterion having to do with course content. Thus, they have no choice but resort to picking randomly one of of the five answers for each of the twenty questions.
In this (purely hypothetical, of course) scenario, assuming that each student makes a completely random selection on each question, one may expect the grade distribution for such a large class to be approximately binomial, i.e., roughly the following will be observed:
(Occasionally, one of the students will be lucky enough to score exactly 50%).
The class average is, of course, 20%, but, as shown above, not all students will get the same score. By sheer randomness, and randomness alone, some students will pick a relatively large number of right answers — in fact, some of them will approach 50%, conceivably a passing score. Now, suppose the instructor sees the above grade distribution and concludes that, while the class was obviously altogether unprepared for the test (which was perhaps a bit too hard), nevertheless the distribution itself shows that some students, the few ones who scored close to 50%, have greater knowledge than others.
Thus, the instructor decides to grade on a curve, and assigns an A to the twelve students who picked 8 or 9 correct answers, B to the 66 who picked 6 or 7, C to the 157 who scored 4 or 5, D to the 137 who got two or three answers right, and fails the rest.
Would that make any sense ? An instructor doing this would essentially rank students based on their luck alone, not on their knowledge of the subject matter. But what should the instructor do, then, in a situation like this one ? Fail the whole class ? Perhaps, but in my opinion the most reasonable, fair course of action would be that of administering a new test , or in any case discarding this one as fatally flawed, and unreliable.
Now, clearly the above example describes an extreme case, in which every single one of the 400 students picks each and every answer randomly, but it illustrates how, if the grade distribution coming out of a MCT features a low average (say less than 50%), telling apart genuine knowledge from sheer luck becomes increasingly difficult a proposition. This is particularly serious a problem if grading is done on a curve, and it is rendered even more serious if one considers that, in actuality, practically no student will be unable to answer any of the questions. In practice, every one of them will get at least a few right, redering the possible contaminating effect of lucky guesses even more significant.
My personal rule of thumb is, the average should be close to 70%. The likelihood that someone knowing the answers of less than ten of the twenty questions will pick fourteen correct answers by chance is slightly less than 4%. Granted, it is not perfect, but then again, no test is.
 Go ahead, Schlupp…
 There could be many a reason for such an appalling outcome, besides the test being seriously flawed. Me being a lousy teacher and/or the class being exceptionally weak — or a combination thereof being just two of them. However, that is immaterial in terms of deciding what the best course of action would be.