Sunday 14 January 2018

programming - What are the disadvantages of opening the source of your own science tools


I am currently finishing my PhD thesis and, as a great deal of the job was to create tools and protocoles, I am considering putting the different scripts I wrote on open source directories (such as SourceForge or GitHub).


The advantage I see for opening them is:



  1. They will be available for everyone and can be reused by the scientific community


  2. They could be improved (and corrected) by others

  3. It ensure my authorship for the different scripts (I can prove I put them there)



However, I was wondering if there is any drawback of doing so (for future publication, version maintenance, and so on).



I have to precise that not all the work was published yet. My field is biological science.



Answer



The issues I have personally encountered working on this - my source code is a mixture of open source and closed source projects, depending on many factors:




  1. You have to maintain your code. This might not be something some people care about, but for me, I dislike the idea of putting out code that doesn't run at least relatively smoothly. Which means while the custom workflow where data bounces between a Python script, a C++ program and then an R script for analysis might work for me, produce good and reproducible results and generally carry science forward, it sure as hell isn't going to see the light of day. Things need to be put into functions in case people end up using your code like a library, general messiness cleaned up, etc. That's...well...it's work.

  2. Documentation. As with the above, I really dislike the idea of releasing something without documentation.

  3. Lack of feedback mechanism/opportunity cost. This one is a big one for me because they are what make 1 & 2 so difficult - it's really hard to tell if someone is using my code. It feels a bit like shouting into an empty room, it has little to no impact on my career, and certainly people aren't using it to the extent that it would appear as a line-item on my CV. So I put in a lot of work that could have gone to another paper etc. purely for ideological reasons.

  4. Sanitizing code. Releasing code into the open and not putting in things that might get you scooped means going over your code to not put in a glowing neon sign that says "Future Directions Here". You can't really have a code base that is the combination of three projects, one being written, one being tinkered with and one really only in the musing stage and open source that code without taking a risk.

    Beyond that, for me, is the potential presence of private health data. So my "released" code needs to be scrubbed of any reference to anything that might be confidential, and along those same lines, now needs dummy data that will work and is validated to go along with the code because the data the code was actually written for cannot just get dumped on GitHub or whatever.



All of this is because you asked for cons. Despite this, I try to put up as much of my stuff as is possible.


No comments:

Post a Comment

evolution - Are there any multicellular forms of life which exist without consuming other forms of life in some manner?

The title is the question. If additional specificity is needed I will add clarification here. Are there any multicellular forms of life whic...