… and I quote myself, here…

The number of independent citations to their own published work that scientists garner in the course of their careers, is widely regarded as a cogent indicator of the impact of their research activity. The seemingly reasonable assumption is made that impact ought to correlate with overall research effectiveness and quality of the work itself (though, obviously, it is an imperfect criterion).

So much importance is attributed to citations, that some go as far as proposing that a scientist’s whole body of work, productivity, impact on his/her field, be assessed through an analysis of that person’s citation record (an example is offered by the controversial h-index).
Regardless of how much value one attributes to citations, an immediate, obvious issue arises when counting the number of times one’s work has been referenced, namely:
Should citations that individuals make to their own published articles, be counted as any other citation, or should they be given less weight, or even excluded from the count altogether ?
To my knowledge, there is no consensus on this issue. The ISI Web of Knowledge search engine counts by default all citations that an author has received, but one is given the option of excluding from the count all self-citations (in fact, any citation that can be construed as such [0]).

A popular (if perhaps not prevalent) opinion, is that self-citations ought not be counted, and at first thought the motivation would appear obvious. Authors could easily inflate their own citation record, by simply citing as many as possible of their own previous articles in any subsequently published piece of work. Indeed, some have argued that the h-index itself should be corrected for self citations [1].
To me, this is very reminiscent of a similar debate, having to do with the number of authors of an articles. Should a singly-authored paper “count more” than one co-authored with others, for the purpose of assessing a scientist’s portfolio ?
While instinctively most of us might think “yes“, in practice any attempt to devise a scheme aimed at assigning different “weights” to co-authored articles, almost inevitably ends up being unfair to some, and ultimately does more harm than good. For example, how would one compare a poorly cited, scarcely read singly-authored article, to one co-authored by several scientists, which is widely read and cited in a given community ? Why penalize collaborative, high quality work, especially in those fields of science where collaboration is almost a necessity these days, given the scope and complexity of the research to be carried out ? And how should the number of authors be taken into account ? How many is “too many” ? If we are going down that path, then we really need to know who did what…
I think that most of us are quite comfortable working with raw numbers of (co-)authored articles, with the full understanding that it is merely a starting point. For evaluation or hiring purposes, for example, obviously a more in-depth examination of one’s publication record is in order, and yes, there are cases where one might appropriately raise eyebrows over a publication list featuring only, or prevalently, multiply authored articles, especially in a field where it is possible for a sole investigator to make meaningful, original contributions (e.g., in theoretical physics).

The same applies to self-citations, in my opinion.
I say, a citation is a citation is a citation, and there is really no compelling reason, much less a clear-cut, fair way of removing self-citations from the count.
Just because it is a self-citation, it does not mean that it is surely illegitimate, self-serving or fraudulent. Quite to the contrary, most citations to one’s own work are no less appropriate or warranted than those to someone else’s work (including those to articles authored by the anonymous referee, who sets them as a condition for publication). What would be the rationale for not counting, or regarding as lesser, citations to one’s seminal work, if a number of projects span off, were carried out by the same investigator (or his/her students) and articles were published, all citing the original paper [2] ? Truth is, prior publications by anyone, that are relevant and/or constitute part of the foundation of the research work described in the manuscript should be cited, simple as that. It is only normal that one’s previous work form the basis for further developments, and if it does, why not acknowledge that ? There is a difference, I think, between a project that evolves into something greater, and in time generates intriguing, original questions for others to answer (including one’s former students and postdocs), and one that does not.

Of course, in order to increase the number of “hits”, a scientist could be tempted to break down work that really should be published as a single manuscript, into many smaller articles, all published separately, each one citing all the others. Look, let us not be naive, I know some who do that — but they are very few, and from that to conclude that all self-citations must fall into such a scheme is nonsensical.
And really, by how much can one boost one’s h-index by means of self-citations ? Here too, I am sure that some cases exist of scientists skillfully using self-citations to bring their own index up but, aside from the fact that in principle we all could do it, and therefore no one is at a comparative advantage, it seems like an awfully hard way to accomplish that goal [3].

There are scientists who run large operations, with a lot of postdocs and/or graduate students working on different but related projects, in turn publishing a lot of articles making reference to one another. I suppose that in those cases, self-citations could make a non-trivial dent to their h-index, but in these cases this would occur largely as a result of the scientist’s overall productivity. One ought not forget that productivity, while surely not being the most important aspect, is valued by most of us, and therefore it does not seem unreasonable that a measure like the h-index, which aims at being all-encompassing, reflect it to a degree.

Once again, I think that this entire discussion originates from a fundamental misunderstanding:
The problem is not with indices and how they are defined, it is with what one does with them, i.e., how far one is willing to take the reliability of any numerical measure, when it comes to evaluating something of the complexity of one’s career achievements, in science or any other profession.


[0] It is worth clarifying what is meant by this. Say scientists A and B co-author a paper, and say B cites it N times, on as many successive papers of which A is not a co-author. ISI Web of Knowledge will regard these as N self-citations, for both authors A and B.

[1] See also, for instance, here and here. All of these studies point to examples of authors whose h-index is “significantly” boosted by self-citations.

[2] Especially if doctoral theses were written based upon these “child” projects.

[3] Consider, for instance, a scientist whose h-index is currently 15, wanting to bring it up to 20 by means of self-citations. Let us assume that said scientist has already published 15 articles which have been cited 20 times each already, and five articles whose number of citations is 10. Each one of these articles need be cited ten more times in order for the person’s h-index to be boosted by 5. That means writing ten independent articles, many of which presumably on the same subject, for the main (or sole) purpose of citing those five… seriously, how many people do you know who do “science” like that ? In my experience, anyone indulging in such behaviour is easily spotted, ends up eliciting suspicions, and for the most part is is dismissed and/or subjected to ridicule by peers.

Tags: , ,

13 Responses to “… and I quote myself, here…”

  1. betelgeuse Says:

    As for the multi-author articles, author contributions should be identified clearly otherwise they cannot be given the same weight with single author papers. For instance, what was the fourth author’s contribution to a six author article published in Journal of Applied Physics? In my opinion, it is essentially zero…
    Nature and its sister journals are printing author contributions at the end of each paper right now. I do not know of any other journal which follows this practice though. Until this practice proliferates, multi-author articles can be seen as an effort to boost publication and thus citation counts.

    • Massimo Says:

      Oh, I really don’t know about that… You have experimentalists who simply cannot publish singly-authored articles, by the sheer complexity of the experiment and the prevailing rules about authorship… what do you do, do you regard them as lesser scientists by definition ?
      And, as for N/S practice of specifying “who did what”… so what ? Do you count that as a fraction of a paper ? Do you list your publications differentiating those where you wrote the article, those where you analyzed the data, those where you did the night shift…. I can see this getting unwieldy and kind of silly pretty soon, frankly… but hey, maybe I am wrong.

  2. Cherish Says:

    Re: multi-author articles

    I think people get too uptight about it. A couple of the papers my group wrote were the result of 9 people working together. We each had our own area of focus, but a lot of what we did for our own experiments became useful for other people in the group. Likewise, there were people who provided lab space, materials, funds, etc. Maybe some people didn’t contribute directly by taking or analyzing data, but a lot of things wouldn’t have happened without their contribution. I’m not sure what other people think, but my personal feeling is that if you have spent a lot of time talking over ideas with someone and those discussions have helped your research move forward, then they deserve some recognition for their time and effort, as well.

    Back to your original point, the h-index self-inflation issue is primarily going to affect younger researchers with a smaller publication record. Self-inflation can easily get someone from an h-index of 1 to 5, but it’ll be much harder going from a 5 to 10 or 10 to 15. It may be useful to give both numbers so that it is easy to see how much self-citation has gone on.

  3. pika Says:

    What about reviewers who try to generate more citations to their own papers by requesting in reviews that the authors add their papers X, Y and Z to the reference list? This happened to me recently: one reviewer asked to add 3 papers to my reference list and the intersection of the authors’ sets of these three papers was one particular person (I guess this was the actual reviewer, although the review was blind). It’s not that rare, although perhaps this was an extreme case, to request 3 papers to be cited. Usually it’s maybe one.

    • Massimo Says:

      Yes, I mention this in my post. I recently had a referee ask for four of his papers cited (I could easily figure out who that person is). In my opinion, one, maybe two were legitimate requests. That is one of the reasons I have nothing against self-citations, if we are going to allow anonymous referees to do this, it is only fair that authors be allowed to cite themselves.

  4. JF Says:

    I really believe that self-citations is a red herring. As you point out, the effect on one’s citation record is going to be marginal at the best, whichever way you look at it.

    Let’s assume you’re a productive scientist; you manage to squeeze 10-20 self citations a year in your papers (that would be huge in my field where the average is 1-2 papers per year per person, but let’s use this number). So you’ve been inflating your records by 20 units. Well, if you are being cited 100+ times per year, 20 will not make any difference, most hiring/promotion/etc. commitees are simply not going to see the difference between 100 and 120 (well within the error bar of what bibliometry tells us, I’d say). And if you are at a level where 10 or 20 more citations do matter (say, you’re being cited 10 times a year) — then, you’ve got other things to worry about anyway.

    Same applies for h-index. Sure, it’s easy to bring it up from 1 to 2, and feasible to force it up from 5 to 6. But if you want to artificially bring it up from 20 to 21, you need to cite all of your 20 previous papers, and you need to cite your newest work 21 times, meaning that you need to write 20 papers. Well, if you’re publishing 20 papers, your bibliometry will take care of itself, with or without self-citations; and if you’re struggling to go beyond h=2, no effort of yours will make your work look good… [h is a fairly robust index, it’s hard to manipulate !].

    So that sort of game might marginally influence one’s bibliometry; but for a productive scientist, it will be lost in the noise: after all bibliometry like every other measure comes with an accuracy and error bars…

  5. prodigal academic Says:

    I also think self-citations have only a marginal influence. In my opinion, the only “reasonable” use of h-index is to compare mid-career people in the same sub-field. Cross sub-fields, and you run into “cultural” issues on citation and publication frequency. Young investigators’ h-index is relatively easy to game and also very closely tied to their adviser’s influence. At National Lab, h-index was used for bean counting, which led to a strongly impact factor driven publication culture.

    In terms of multi-author publications, in my interdisciplinary area, it is next to impossible to do experiments with a single author. This also depends on the local environment. When I was at National Lab, the culture was to do very collaborative work, so our papers tended to have somewhere between 3 and 10 authors. There is only so much you can do to tease out contributions. That is why you need to look at the whole CV to get a picture of what any particular scientist has contributed. A good scientist at National Lab would have a mix of first author, middle author, and last author papers (the mix depending on career stage).

    • Doug Natelson Says:

      I agree completely with this post. As a mid-career person, only now am I entering a regime in which my h-index is not drastically affected by the type of group I was in in grad school (4 or so students, loooooong experiments).

      • Massimo Says:

        Doug, Hirsch himself states that this index has some kind of “error bar” associated to it, and in order for it to be meaningful and telling, it should have a value significantly greater than its uncertainty. Now, if we take something like 3 or 4 for its uncertainty, it stands to reason that when h</i is of the order of, or less than 10 or so, it should be taken with a grain of salt. I do not see anything wrong with that.

  6. Schlupp Says:

    “Of course, in order to increase the number of “hits”, a scientist could be tempted to break down work that really should be published as a single manuscript, into many smaller articles, all published separately, each one citing all the others.”

    Filtering self-citations would, I think, not even remove this temptation. After all, there are also people who (try to) publish essentially the same stuff twice or even more often in order to increase their paper count. (Yeah, one of the times in proceedings, but if this “does not count” as “publishing twice”, then why should it count for anything? And if it counts for nothing, then why do it at all?) And in these cases, it is better if they at least cite the previous copies of their work, so that referees and readers are informed that it is not new.

    The two people whom I noticed to inflate their citation counts by self citations did it by publishing scores of papers in venues like the Journal of Less Interesting Results, where it is not hard to get in with some quick work. I’d be quite curious how well this strategy works, but I suspect that anyone would simply think their work boring and repetitive.

    • Calvin Says:

      Hmm. Once, as a referee, I actually caught someone trying to publish twice, or something very close to it. Foolishly, they put the “original” paper as a reference. I was trying to understand how they got some numbers, and when I looked at the reference, I saw that all the figures were exactly the same–no new figures. I made no specific accusations, but simply pointed out this fact and that I couldn’t figure out what was new, and turned it over to the editors. The editors rejected the paper.

      • Schlupp Says:

        See, my point exactly: Had they not cited their own work, you’d have had a harder time catching it.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: