Tuesday 19 March 2019

collaboration - Organizing data and files


Over the last four years I did a bunch of experiments (biotechnology, protein purification). Some of them are more valuable, some of them were just poorly designed (e.g. temperature not controlled, etc.). Some of them turned out to be important one year after I collected the data.


The data is highly heterogeneous (microscopic images, ELISA, SDS-Page, HPLC of various suppliers, settling velocity, filtration tests etc.; formats are doc, tex, xls, txt, csv, tiff,py, ipynb, png,pdf and proprietary formats like leica`s lif format)



Usually, I started a new folder on our network drive for each task (like afm_particles/2015_01_13_particle_1). Over time I gathered about 50 Gb of heterogenous data.


Usually, I do not alter the data in place, but load it into an Ipython notebook and do the calculation and plotting with python. The calculation files are stored in the same folder as my data. I have a file called INFO.txt in every folder with a brief description of the data within the folder, experimental conditions etc.


Usually, I know where I can find my data, but recently I wanted to change a plot for my thesis and spent about 15 minutes searching for the right sub-folder. In the end I searched for the name of the file and found it. A colleague of mine quit his job a year ago, and if I wanted to reproduce his experiments, I would have to invest several hours in order to find the specific file containing the raw data (hopefully with physical units stated somewhere).


We do write quarterly reports and paper-based lab journals (some more thoroughly than others). However, navigating through my own data is challenging. Not to mention someone else's data.


How do you handle heterogeneous data collected throughout a project (3 - 5 years) ?


Have you found a solution satisfying for both, people who have to document it and people who might continue the project ?


It would be great if I can browse the collected data without going through all the subfolders. For example by some form of index with information for each folder like: title, short description, usefulness of experiments, worked out as expected, totally crap, totally different than expected but interesting.


Are there tools capable of such a thing (Linux, Windows) ?




No comments:

Post a Comment

evolution - Are there any multicellular forms of life which exist without consuming other forms of life in some manner?

The title is the question. If additional specificity is needed I will add clarification here. Are there any multicellular forms of life whic...