The American Physical Society (APS) has recently started the outstanding referee recognition program. This is a “highly selective (lifetime) award program to recognize scientists who have been exceptionally helpful in assessing manuscripts for publication in the APS journals”. The initiative is certainly commendable, long overdue, and actually does not go far enough (in my humble opinion, of course).
Anonymous peer review of manuscripts submitted for publication is crucial to the advancement of the scientific enterprise; yet, the task of reviewing a manuscript is very much a thankless one. It is simply regarded as part of the profession, a duty which any self-respected scientist has to perform. It is hugely time-consuming, often aggravating, and in the end hardly any recognition (never mind monetary compensation) is given to a competent, diligent, reliable and honest referee, not only by his/her own institution, but by the scientific establishment as well.
So, that good refereeing should be recognized seems out of question. What is less clear is, what should be regarded as good refereeing ?
Most of those who submit manuscripts for publication on a regular basis have no trouble identifying the traits of a bad referee. The worst by far is unresponsiveness. Material described in a submitted paper owes much of its interest and relevance to its timeliness, especially in highly competitive times such as these. Referees sitting on manuscripts for weeks, months at a time, ignoring repeated reminders on the part of the Editors and finally declining to submit a report, or submitting a one-liner, half-assed, incomprehensible piece of gibberish are not only damaging to the authors, they also do a disservice to the community as a whole.
Equally annoying are referees who abuse their position of power (and the impunity granted by anonymity), to push their own agenda in all sort of ways, from stalling publication of results obtained by competitors, to demanding citation of their own work even when unwarranted, to rekindling personal feuds with some individuals, and so on .
Yet, irritating as the above behavior can be, it is clear that a good referee is not just one who responds quickly, politely and in good English. Naturally, a referee can and often does contribute to the improvement of a manuscript, with pointed, appropriate and insightful comments. Ultimately, however, the act of refereeing is very much akin to a measurement, in this case of the quality of a piece of scientific work. The referee plays the role of measuring instrument, and a good measuring instrument is accurate and reliable.
Accuracy in this case consists of differentiating those manuscripts that meet the accepted criteria of novelty, interest and scientific quality from those that do not , and making a recommendation consistent with those criteria. Reliability consists of making consistently accurate recommendations over an extended period of time.
How does that translate in practice ? A scientist will build, over a period of, say, ten years, an extensive refereeing record. Refereeing effectiveness can only be quantitatively assessed through the computed correlation coefficient between one’s recommendations, and the final editorial decision (i.e., acceptance or rejection of the submitted manuscript). If such a coefficient is close to +1, it means that that individual’s opinion is valuable, and (s)he gets it right most of the time, thus providing a precious service to the community .
The Editor can rely heavily on this person’s assessment, presumably cutting down significantly on the processing time of manuscripts . Conversely, a referee whose correlation coefficient is close to zero, i.e., one who, close to 50% of the time, makes a recommendation that the Editor eventually opts not to follow, is not contributing effectively to the process, as his/her reliability is essentially that of a tossed coin. I wonder whether referees are actually classified (at least internally) in this fashion. I would certainly hope so.
Now, of course, one could equally well argue that if the correlation coefficient is close to minus one, that makes the referee very accurate and reliable as well, as all the Editor needs to do is simply, consistently the opposite of what this whacky, “inverted-scale” referee recommends. This procedure is no less accurate and reliable than following the opinion of a referee whose correlation coefficient is close to one.
Which brings me to my main point: Where is my outstanding referee recognition award, APS ?
 In these cases the job of the Editor is to edit, i.e., among other things remove referee comments that are inappropriate, offensive or in any case out of place, not conducive to a rational and measured scientific exchange. In most cases, such comments can be easily spotted. Often times, the origin of a bitter contention between author and referee can be traced back the failure of the Editor to remove an inopportune sentence from a referee report. In general, precisely because referees are anonymous the bar for professional and respectful conduct should be set higher on them, not on the authors.
 Note that whether or not such “accepted criteria” are fair, objective, well-defined and just, is irrelevant. The fact is, they exist, and a decision will eventually be made with the aim of implementing them.
 From this standpoint, the specific reasons behind each recommendation are quite irrelevant, just as the physics underlying the operation of a thermometer is irrelevant: all we require of a thermometer is that the temperature reading be accurate. Who cares how the thermometer does it…
 Naturally, a measuring instrument ought not influence the measurement itself, i.e., correlation is different than causation. This is worth specifying, because one could easily imagine a situation in which referees who enjoy a high profile in the community, exercise their influence over an Editor and manage to impose their views, thereby de facto determining the outcome of the review process. This would lead to a high correlation, in time, for those particular referees, but is certainly not what the scientific community is aiming at.