Saturday 23 December 2017

publications - Why are CS researchers reluctant to share code and what techniques can I use to encourage sharing?


While researching a topic area I have come across a number of papers that claim to improve on the state of the art and have been published at respected outlets (e.g. CVPR, ICIP). These papers are often written in a way that obscures some of the details and their methods can be lacking in detail. Upon contacting these authors for more information and asking if they would kindly make their source code available they stop replying or decline the offer.


Why are computer science researchers reluctant to share their code?



I would have expected that disseminating your source code would have positive effects for the author, e.g., greater recognition and visibility within the community and more citations. What am I missing?


For the future, what are some better ways to approach fellow researchers that will result in greater success at getting a copy of their source code?



Answer



Why researchers might be reluctant to share their code: In my experience, there are two common reasons why some/many researchers do not share their code.


First, the code may give the researchers an important advantage for follow-on work. It may help them get a step ahead of other researchers and publish follow-on research faster. If the researchers have plans to do follow-on research, keeping their code secret gives them a competitive advantage and helps them avoid getting scooped by someone else. (This may be good, or it may be bad; I'm not taking a position on that.)


Second, a lot of research code is, well, research-quality. The researchers probably thought it was good enough to test the paper's hypotheses, but that's all. It may have many known problems; it may not have any documentation; it might be tricky to use; it might compile on only one platform; and so forth. All of these may make it hard for someone else to use. Or, it may take a bunch of work to explain how to someone else how to use the code. Also, the code might be a prototype, but not production-quality. It's not unusual to take shortcuts while coding: shortcuts that don't affect the research results and are fine in the context of a research paper, but that would be unacceptable for deployed production-quality code. Some people are perfectionists, and don't like the idea of sharing code with known weaknesses or where they took shortcuts; they don't want to be embarrassed when others see the code.


The second reason is probably the more important one; it is very common.


How to approach researchers: My suggestion is to re-focus your interactions with those researchers. What are your real goals? Your real goals are to understand their algorithms better. So, start from that perspective, and act accordingly. If there are some parts in the paper that are hard to follow or ambiguous, start by reading and re-reading their paper, to see if there are some details you might have missed. Think hard about how to fill in any missing gaps. Make a serious effort on your own, first.


If you are at a research level, and you've put in a serious effort to understand, and you still don't understand ... email the authors and ask them for clarification on the specific point(s) that you think are unclear. Don't bother authors unnecessarily -- but if you show interest in their work and have a good question, many authors are happy to respond. They're just grateful that someone is reading their papers and interested enough in their work to study their work carefully and ask insightful questions.


But do make sure you are asking good questions. Don't be lazy and ask the authors to clear up something that you could have figured out on your own with more thought. Authors can sense that, and will write you off as a pest, not a valued colleague.



Very important: Please understand that my answer explaining why researchers might not share their code is intended as a descriptive answer, not a prescriptive answer. I am emphatically not making any judgements about whether their reasons are good ones, or whether researchers are right (or wrong) to think this way. I'm not taking a position on whether researchers should share their code or not; I'm just describing how some researchers do behave. What they ought to do is an entirely different ball of wax.


The original poster asked for help understanding why many researchers do not share their code, and that's what I'm responding to. Arguments about whether these reasons are good ones are subjective and off-topic for this question; if you want to have that debate, post a separate question.


And please, I urge you to use some empathy here. Regardless of whether you think researchers are in right or wrong not to share their code in these circumstances, please understand that many researchers do have reasons that feel valid and appropriate to them. Try to understand their mindset before reflexively criticizing them. I'm not trying to say that their reasons are necessarily right and good for the field. I'm just saying that, if you want to persuade people to change their practices, it's important to first understand the motivations and structural forces that have influenced their current actions, before you launch into trying to browbeat them into acting differently.




Appendix: I definitely second Jan Gorzny's recommendation to read the article in SIAM News that he cites. It is informative.


No comments:

Post a Comment

evolution - Are there any multicellular forms of life which exist without consuming other forms of life in some manner?

The title is the question. If additional specificity is needed I will add clarification here. Are there any multicellular forms of life whic...