For a long time, I had been using open source software for my work to boost "reproducible research". I believed that if I made my codes open source and the softwares to run those codes in were open source too (or at least free), my research would be utmost reproducible. However, recently, in a discussion, it came through that research is more reproducible if one uses "popular" softwares instead of "unpopular" free ones.
For instance:
I had been using Scilab (Free) for a lot of my work and distributed my files to others. But I was surprised that more people had MATLAB ($$) and preferred if I sent them MATLAB files instead (little modifications).
My question is :
Assuming I'm starting a new project and I wish to make it as reproducible as possible. Should I be using relatively unpopular free software or extremely popular proprietary ones?
Answer
I think there are two kinds of reproducibility:
- The ability of someone else to run your code and obtain the same output.
- The ability of someone else to write their own code that does the same thing as yours based on your description and on examination of your code (reproduction from scratch).
The second kind of reproducibility is much more convincing, since the main point of scientific reproducibility is to verify correctness of the result. For science that relies on code, it is usually impossible to include every detail of the code in the paper, so verification requires examination of the code.
If you use proprietary software, your code probably makes use of closed source code, and therefore it cannot be verified or reproduced from scratch. If you use open source software, then all of the code that your code calls is probably open source, so it can all be verified or reproduced by someone else from scratch.
At present, it is probably true that the first kind of reproducibility is more achievable with proprietary, widely-used software. I am optimistic that the current trend will lead to open-source software catching up in terms of wide use (consider SAGE, for example).
Addendum, in light of Epigrad's answer below, which I mainly agree with: The problem with relying on closed-source code isn't that someone else won't know what that closed-source code is expected to do.
The problem is that if you have two closed-source implementations of the same algorithm and they give different results (trust me, they usually will), then you have no way of determining which (if either) is correct.
In other words, closed-source code would be fine for reproducibility if it were bug-free. But it's not.
No comments:
Post a Comment