Whose code is this ?

Much of the current research work in theoretical physics involves numerical computation. This is because calculations themselves are too complicated to be tackled analytically (i.e., with pencil and paper). The writing of a code suitable to carry out calculations of a specific type, especially one that is flexible, easy enough to use and relatively general in scope, is a major undertaking, one that can consume the better part of a doctoral thesis, for example.
Once the code is functioning, and the project for which it was initially written has been completed, the researcher who developed it often finds him/herself in the somewhat enviable position of having a tool that may be of interest to others, with which different, relevant and important problems may be investigated.
One issue that often comes up in conversation is: what is the accepted protocol for sharing a code with other investigators, or groups ?

First of all, why would anyone skip the all-important process of writing one’s own code, and rely instead on something that is, to a large extent, a “black box” ? Is it not more valuable, from the pedagogical standpoint, as well as reliable in the long run, to write a code from scratch ? The answer is, it depends, but not always. One would not expect an experimenter to build all of the instrumentation needed to carry out some measurements, especially if one is talking standard tools that can be purchased off the shelf. The same notion applies to computer codes, particularly if the research project as a whole should involve more than just computation [0]. People may often simply deem it more efficient to utilize an existing, at least partly tested code, rather than going through the exercise of writing something that already exists, i.e., duplicating work done already by others.

I am not actually aware of any “accepted” protocol for code sharing, but it has been my experience that these situations can lead to unpleasant arguments among scientists. The issue, of course, is how best to give the person who first wrote the code, proper credit for his/her contribution, for without that code the completion of a (publishable) piece of work would not have been possible.
I have been in that situation myself, a lot of times; I have written large, fairly general-purpose codes that have been utilized by my students and postdocs, who in turn have then taken them with themselves after leaving my group, and could re-use (or, have re-used) them for different projects.
Do I expect to see my name featured by default on the author list of any paper that is based, at least in part, on calculations performed with my code ?
Of course not.

My general philosophy is that, the moment a code with its source is given out to anyone, the beneficiary is free to utilize it for his/her own purposes, and need not include the original writer of the code in the collaboration. Especially if the code is based on a novel computing idea or technology, one which may hold promises with respect to existing algorithms, it is in the developer’s best interest to have as many independent researchers as possible utilize it, and apply it to the widest variety of cases or physical systems.
Besides assessing the general effectiveness of the methodology, this mechanism generates citations to the original work of the developer, in which the technology is first proposed and/or illustrated, and it is the number of citations that gives a quantitative sense of the impact of the work on a field. That, to me, citation to one’s original work, is the only “reward” that the initial developer of the methodology should rightfully expect.
That in turn entails that it is in the developer’s best interest to distribute the code to colleagues without any strings attached, provide the proper documentation, and/or render its use as easy as possible to others.

A scientist should generally expect to have his/her name included on the author list of a manuscript describing research for which a code was used that said scientist had to either write from scratch or modify substantially (e.g., add extra functionality). If the technology was not novel, but still someone was asked to sit at a desk for a number of hours, to write and test new code, that person should be rewarded with authorship.
If it is instead just a matter of making trivial modifications to something that exists already, essentially “repackaging” the code for someone else’s use, and if the computing methodology is well-established (i.e., described in textbooks), then it seems to me that an acknowledgment of the contribution of that person, at the end of the manuscript, should suffice.

Notes

[0] A typical example is given by some brilliant theorist, developing a sophisticated approach based on complicated, abstruse but cool-looking formalism, capable of making stunning, if scarcely believable predictions. At that point one may want to run a computer simulation to find out if, on top of its unquestionable aesthetic value, the remarkable approach mentioned above may even get some physics right — something that would be welcome, if not really a requirement. Of course, the role of simulation in a hypothetical case of this type would be ancillary, secondary, just a “technical thing”, really nothing anyone should spend too much time discussing. We all know that, right, Schlupp ?

Tags: , , ,

11 Responses to “Whose code is this ?”

  1. El Charro Says:

    I don’t know much about computational physics so my comment might be irrelevant if the things I mention already exist but, it’s Sunday… so what the heck.

    Assuming that the progress in codes is fast and large in volume (think of number of papers), you could have a journal dedicated exclusively for that, sort of like Review of Scientific Instruments, but for codes. That can take care of acknowledging the initial author of the code. One quick reference and it’s taken care of.

    • Massimo Says:

      There are journals like that, for example the Journal of Computational Physics and Physical Review E.

      • El Charro Says:

        mmm not exactly what I was thinking about. Journal of Computational Physics definitely gets closer to what I had in mind, PRE not so much. Can you download a copy of the code from PRE or Comp Phys? If you can then they are definitely very close to what I was thinking about.

        I can see two different contributions to a code: 1) a new method of calculating something (not necessary a new theory, maybe just a more efficient computational technique) and 2) actually writing the code. In experimental physics you always acknowledge for 1 since it is hard to take a piece of equipment from one lab to the other. Making a copy of a code is easy. Maybe that’s why it is difficult to decide whether or not the original author should be credited or not.

  2. Schlupp Says:

    Massimo, can you please write a different post? I don’t find anything to complain on this one.

    I think most hard feelings in this matter are generated by disagreement over what is a ‘trivial modification’ and what is ‘substantial work’. It’s amazing how ‘trivial’ someone else’s work seems to seem to some people – and how substantial our own work usually is.

    “Is it not more valuable, from the pedagogical standpoint, as well as reliable in the long run, to write a code from scratch ?”

    Usually yes, but with too few papers in the short run, there is not going to be any long run…..

    Oh, and for you footnote: Obviously.

  3. Schlupp Says:

    Massimo, what are your thoughts on a reasonable balance between “coding a new method, that doesn’t work out, at least not within 10 years, which may be ok for the advisor, because he can then reap the benefits together with grad student #n, and anyway he has tenure, quite in contrast to the the student” and “writing lots of cheap papers in an assembly-line setup while not learning all that much” in a PhD thesis?

    (Yeah right, one should BOTH code a revolutionary method and have lots of thoughtful, likewise revolutionary, and well-cited papers.)

    • Massimo Says:

      Wish I knew that, friend… I personally believe in risk taking, and therefore I like to go for big things, and recommend students to do the same — I think they key is to identify meaningful intermediate goals, that will lead to publications anyway.
      However, it is not just a problem that we face — high energy physics graduate students often times do not get to see the outcome of the experiment on which they spend all of their time while in graduate school.

  4. Professor in Training Says:

    I know absolutely nothing about coding and remember very little about physics but is a patent a possibility for code you create?

    • Massimo Says:

      Not a clue… I don’t think you can patent a code, though… you might be able to patent the idea underlying a code but… that would presumably mean not publishing it, and that sort of goes against the academic philosophy.

      • Schlupp Says:

        You cannot patent code in Europe, you can get copyright of course and licence the software. You can to some extent patent code in the US, explicitly NOT, however, the underlying algorithm. (How exactly they figure out the boundary would be interesing.)

        Patents are published, this is what the whole idea is about: In exchange for publishing the work, you get the right to keep others from using it, for a limited time. Actually, the “publishing” aspect is where the word itself comes from.

    • Phelippe Says:

      A patent is probably the worst way to handle this situation. There are several different approaches in used in the DFT community. Some groups are happy to share their code in exchange for a co-authorship in the first paper; others sell the code for a small fee; and there are groups that sell the whole thing to a company, which commercializes the final product.

  5. pablo Says:

    Hi Massimo,
    Some subfields do share codes. For example this one:

    http://senselab.med.yale.edu/modeldb/

    If a code is re-used, the tradition is to cite the original work, not to include the code’s authors in the author’s list

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Follow

Get every new post delivered to your Inbox.

Join 28 other followers

%d bloggers like this: