Friday 23 March 2018

copyright - Public dataset without license: what is allowed?


Many public scientific datasets are accompanied by a license, for example variations of Creative Commons are used quite often. However, in many cases public academic datasets lack any licensing information. Here are two examples:


At PhosphoSite we find it is "created by Cell Signaling Technology is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License", so permissions of usage are clear from the license.


Here is an other dataset where we don't see anything about licensing, not even a statement that the data is free to use for academic purposes. It only states that "can be downloaded", but what can I do after downloading? Can I redistribute it, modify it, sell it? From the context I suppose they intended their data for public use, they are happy if more people use it so they get more credits and citations. But this is just an assumption.


The copyright holder is clear in both cases, the permissions are missing in the second case.


I am wondering, if I write a software using this data, or I construct a more complex dataset including this data, what solutions are legally correct:


1: The software downloads the dataset at each user's own machine from the original source and processes it. However this is not always possible or not easy to implement.


2: Distribute a copy of the original dataset, or a modified version, with attribution to the original source. This is more problematic, I think, but I am not expert in law.


Asking for permission is an obvious solution, but in case of dozens of datasets, it is a tedious and long process to contact all the copyright holders, and wait for their responses. Also, some datasets are supplements of published papers. In this case, if the journal has a license, e.g. Nucleic Acids Research is published under Creative Commons, is this valid for these datasets?



Answer




This may vary by country, but in the US at least, the default terms (if no alternative is specified) are "all rights reserved", which means that you are not allowed to reproduce the work (the dataset), prepare derivative works from it, sell/rent/lease it, or publicly perform or display it. In particular, redistributing it to others is not allowed. In practice, the authors may be fine with it, but legally speaking, you're not allowed to redistribute the dataset in the form it was published in, without having explicit permission (which could be provided by a Creative Commons license, for example).


If you wanted to reproduce the dataset in a different form that conveys the same information, that may or may not be allowed. It would be up to a court to decide whether that counts as a derivative work or just a use of the underlying ideas, and a copyright lawyer could advise you better on whether your desired use is legally acceptable.


You are allowed to read the published dataset and use the ideas contained within it (i.e. the data) to draw conclusions. Copyright law does not allow for the restriction of those rights. I think doing an analysis on the data and publishing the results of that analysis (but not the data itself), as you would do in the process of writing a paper, is generally considered to be fine.


No comments:

Post a Comment

evolution - Are there any multicellular forms of life which exist without consuming other forms of life in some manner?

The title is the question. If additional specificity is needed I will add clarification here. Are there any multicellular forms of life whic...