Friday 10 January 2020

Data publication basics - where, why, how, and when should I publish my unpublished data?


Many researchers have unpublished data. Some of this data may never be published as a manuscript. But I would like to make scholarly contributions of data that I have no intent on publishing, e.g. by publishing a "data paper"


The term "data paper" may be too new to be familiar, so here is a description from the Ecological Archives website:




Data Papers are compilations and syntheses of data sets and associated metadata deemed to be of significant interest to the ESA membership and the scholarly community. Data papers are peer reviewed and are announced in abstract form in the appropriate print journal as a Data Paper. Data papers differ from review or synthesis papers published in other ESA journals in that data papers normally will not test or refine ecological theory. Data Papers can facilitate the rapid advancement of ecological knowledge and theory at the same time that they disseminate information. In addition, Ecological Archives provides a reward mechanism (in the form of peer-reviewed, citable objects) for the substantial effort required to compile and adequately document large data sets of ecological interest



This brings up the following questions:


What makes a good data repository?


Which data repositories provide a doi: for raw data?


Should published data be separate from articles on a CV?



Answer



There are a few things that I would consider when choosing a data repository:



  • Does it let you release your data under a license you're happy with?


    • Applying too restrictive a license can prevent anyone from doing anything useful with the data, so think about what you're prepared to allow. In particular, remember that most of the research done in academia could be considered "commercial" from a legal perspective. On the other hand, you may wish to choose a license that ensures you get credit for your work. You may or may not agree with them, but reading the Panton Principles will give you some idea of the issues here. Also take a look at this list of licenses written with data in mind



  • How easy will your data be to find?

    • People will only use your data if they can find it. I recommend Googling (other search engines are available) for some datasets you know of in your field and see if they come up — those repositories which are indexed by the major search engines will put you at a big advantage when it comes to attracting citations.



  • What repositories are well known in your field?


    • Your institution may have a repository which you can easily deposit in, but it won't be the first place colleagues in your field will think of to look. If there are well-established repositories I would prefer those, or make sure your data is indexed by a well-established aggregator (I know ANDS runs a national aggregator in Australia).



  • What does your institution allow?

    • In many cases, your institution will own (or otherwise have a claim to) the data you generate as part of your research, so check what your local policies are and if need be ask your supervisor, head of department, legal team, etc. This will particularly affect your choice of license.





The other parts of your question can probably be answered better by others here (or maybe it should be split into several?)


No comments:

Post a Comment

evolution - Are there any multicellular forms of life which exist without consuming other forms of life in some manner?

The title is the question. If additional specificity is needed I will add clarification here. Are there any multicellular forms of life whic...