Thursday 18 July 2019

publications - How to share computer code?


I am writing a paper with my adviser. Some of the key contributions of our paper include:



  1. A computer simulation model of an inventory system

  2. A set of simulation experiments and results to validate the simulation model

  3. An implementation of an inventory policy which we propose

  4. Simulation results of our proposed inventory policy and inventory policies which are currently used in practice



We want to make our computer code freely available to others. The simulation model was written in the Java programming language so other researchers should be able to run the code on their own computers fairly easily. By sharing our computer code, we hope that other researchers will develop inventory policies which they can test using our simulation model. (Of course, it would be beneficial for us if they would do so and cite us in their paper!)


Question: What is a good way to go about sharing our computer code?



  1. Where do I host the code?

  2. Which software license should we "publish" the code under?

  3. How do I make it easy for other people to run the code?



Answer



Answer to 1: Where should I host my code?


Depending on what your University offers you, you could choose to host it with the University, or perhaps with an open-source repository such as Github, Bitbucket, SourceForge, or similar.



Many of these services have a "paid" subscription option for private repositories if those are required.


Answer to 2: What open-source license should I choose?


This question is relevant because we're having this discussion right now within one of our own research projects. I happen to know a little about open source software, having researched it in the past and having taught a few courses on it.


Though there are a lot of open-source licenses out there, they really end up coming in two main families. They're either permissive open licenses (ex: MIT, BSD, Apache) or they are Free (GNU Public License v2 or GPLv3). Here's a brief lowdown by the Open Source Initiative


Permissive open licenses These licenses generally allow you to release your code and anyone can do anything with them that they want as long as they retain certain copyright information with the code. In reality, this has a number of implications.




  1. Someone could take your entire code base, create a product with it, and sell it.





  2. Someone could take parts of your code, put it in their own project (commercial or not).




  3. Because the license is more permissive, you yourself could take the code, close it, and then keep under wraps any future releases so you can make money off of the code or hide it from the public.




  4. Because the license is more permissive, you might generate more interest as a result. People may take code from other projects and use it to improve yours. On the flip-side, they could also make improvements for your source code and never share them back with you.




On the flip-side, the GNU GPL is a Free Software License that disallows you from doing certain things. In that sense, it's more restrictive, but does so for a number of ideological reasons.





  1. If you release software under the GPL, you can't close-source it. Ever. It's going to remain in the open, and if someone asks you for the source code you are obligated by the terms of the license to provide it (if you host it on Github or another public repository, then you have already satisfied this requirement).




  2. A company could take the code and make products with it and sell it (it's their right to do so), but they would have to do so under the condition that any source code that they write for the project is also released under the GPL. Because of this, a lot of companies who make a lot of money writing software don't like this because they have to continually release code to the public. On the flip-side, any cool stuff that they do gets put into the public under the GPL, so you could fold it back into your project and improve it. They can't take your code, improve it, and then never share it again.




  3. If you happen to have used any GPL code in your project (let's say you took a few lines out of the Linux kernel or Git version control or whatever) then you'll have to release your code as GPL as well.





In the end, the choice of license affects more about how you want the software to be used (and the eventual community it might bring in). If you plan to commercialize the software, (and implicitly allow others to do the same), then you might want to lean BSD. If you don't want people to take your hard work and profit off of it without showing you the results, then you want to go GPL. If you don't care either way, then you could probably just choose one. I think BSD is popular in academia precisely because of the commercialization aspect (for example LLVM is gaining a lot of traction because of its permissive license).


Answer for 3: How do I make it easy for others to run the code?


You make it easy to run code by engineering it to be easy to run and by being extremely detailed with your documentation.


Packaging/distribution can actually be pretty hard and usually take more effort than most people would think. A good way to make the software easy to run is to test it on multiple machines. Make sure that you're not forgetting any of the libraries that you're using in your software project, for example, and when possible, try to use software libraries that are common and well-maintained. Use mainstream languages with easy-to-manage package repositories.


When appropriate, use installers, installer scripts, Makefiles (distutils, which uses automake/autoconf is better), etc. Even shell scripts are better than nothing. If you can provide binaries and/or an installer, that will make things even easier. The problem is that this is a LOT of work!


Accompany it with documentation. Ideally, the documentation will contain a description of how to set it up and run it, with descriptions of necessary packages/libraries, data that you might have to get, and what to type or click on. Usually, something called README or INSTALL will attract attention. Put the instructions on the web page as well, most of the hosting solutions also allow you to have web pages.


Hope this all helps. The hardest part of the process is by far Step #3 and most people don't get as far as to use good techniques like installers, automake/autoconf, and so forth because it's a LOT of work and development often moves faster than you can write documents. However, no one is grading you on your style so it's often easier to get it out than it is to clean it up and prettify it first.


No comments:

Post a Comment

evolution - Are there any multicellular forms of life which exist without consuming other forms of life in some manner?

The title is the question. If additional specificity is needed I will add clarification here. Are there any multicellular forms of life whic...