Type of data of interest
I would like to consider
- Genetics data (SNP, microsatelites, whole genome sequencing, RFLP, ...)
- Genetic - phenotype data (disease-related data, QTL, etc...)
- Sequence annotation and function
- Transcriptomic data
I would like to include data on any living thing (including data from fossils) and not only human data. To avoid issues of semantic I would leave out epigenetic data.
Question
How much (in bytes) of such data is available in Open Access online?
Difficulties
I realize getting to such estimate might be hard and the estimate may be very inaccurate. Also, the format used for storing these data will definitely affect the relationship between information content and storage usage. But if someone can give just a rough order of magnitude, a vague intuition, it would already help. Is it a few terabytes or a few petabyte or even more?
I would welcome as well a detail of how you got to this estimate. I am particularly interested in what fraction of it is human data (if you happen to get to such fine detail).
No comments:
Post a Comment