collaboration - I'm looking for an application or workflow which integrates data values into the writing process

Friday, 10 August 2018

collaboration - I'm looking for an application or workflow which integrates data values into the writing process

Currently, on all of my projects, I do my data analysis, create my figures, put them into a word document, and then I start writing. I say things like "We saw a 35% reduction in the effectiveness of..."

100% of the time, these numbers change. We reach the discussion section and a co-author requests we change the analysis this or that way, or finds an error in my thinking.

Then, I need to go back and re-do the analysis. Then, I'll need to find every point of data that was potentially affected and manually change it in a word document. Occasionally, this back-and-forth leads to the introduction of errors.

I would love a writing platform that would allow me to integrate my data into the writing process. Instead of writing

"We saw a 35% reduction in ..."

I would say:

"We saw a <% print(reduction.round()) %>% reduction in.."

Of course I could do this from scratch on my own computer, but I then lose the ability to collaborate.

I'm wondering if anyone has had this problem, and how they have solved it?

Answer

While other answers have given some very good suggestions, I wish to focus on the part "if anyone has had this problem, and how they have solved it?" of the question.

I use Sweave and can only speak for this particular method. My general thoughts are that:

Yes, it's awesome.

However, the time to make the two sets of code to work may not necessarily be shorter or less miserable than revising the statistics and tables by hand. It has some learning curve. So, I'd suggest considering using this method if you have i) some documents that need to be repeatedly created or the data are repeatedly being appended, like periodic reports, or ii) some analysis that involves a large amount of repetitions.

The benefit really shines for tables and graphs. Yet I found that embedded text can be troublesome. For instance, weird sentence like "the mean energy intake increased by -1357 kcal at the end of the study."

As an extension of the above, sometimes the restructuring of the analysis can be so drastic that the codes will need to be revised extensively. And you'll have two sets of code to revise and two sets of bug to catch.

In my own circle of colleagues, it's hard enough to have them keep the statistical syntax in a standardized format. I will not even ask if they use LaTeX, not to even mention Sweave.

Having said that, it is indeed very satisfying to see a 100-page PDF analysis report being revised with one click. I'd suggest at least find a suitable environment to try once. By the way, Sweave can also work with Stata and SAS (statweave), quite versatile.

Now, back to the root cause. I'd like to share with you how I minimize this Sisyphean situation.

Remember, if you do no take charge, coworkers will take charge for you. Some statements to express firm decisions about leaving and entering a certain stage in the analysis process can be forceful and yield productive results. This is also true if you are just a student and they are your supervisors. Some reasonable assertiveness goes a long way.

Put all the data set details, variables, research questions, proposed analyses, and some reasonable amount of "plan B's" on what I call a DMAP (Data management and analysis plan.) Pay particular attentions to: i) how missing values will be handled, ii) how outliers are defined in the key variables of interest, and iii) recoding scheme if any categorization is to be done. Gather input from all of them. Once finalized, carry out the analysis.

In the next meeting, share analysis report (but NOT write up). Prepare a descriptive statistics package. And then according to the research questions, lay out the main findings in the same sequence. After each summary output, state 1-3 main "talking points" that will be the foundation (or topic sentences) of the Discussion. Show only necessary output and make sure to make them reader-friendly. Highlight or bold the parts that you want them to focus on. Have the group contribute their thoughts on revision or sub-analysis. Revise the DMAP. Have the previous DMAPs handy to avoid the "you said, I said" situation.

Repeat steps 2 and 3 until no more input was given. Be very clear that "you are going to finalize this analysis and start writing the Discussion." Are there anyone not replying your e-mail and can potentially disrupt this finalization? Deal with them individually before moving on.

Go on to craft the Discussion based on the talking points that have been previously agreed upon.

Along the process, keep clear documentation. Keep your syntax files and analysis report files clear and dated. Include section numbers corresponding to the research question, page number, and line number. Date and sign (provide name and e-mail) all your reports and syntax files.

The main point is: do not write the Results and Discussion and distribute them before the analysis is finalized. You may draft them in private, but never circulate them while the analysis is still actively being evaluated/revised. Doing so provides too many distractions to the group, and it's just going to end up with a hot mess.

In my own experience 75% or more of the so-called sub-analyses are what I call "brain farts." They are a healthy sign that the brain is working, but not pleasant if happening too frequently. Most of them are "what if's" and they can be out of control especially if the results do not go with how they want the world to work.

Yet, 1 out of 8-10 times the suggestions can be good. I usually will take the pain to revise the analysis plan and restart the process. Leave the writing, and come back to deal with it with the new analysis is finalized.

Finally, some catch phrases.

"That is a great suggestion, however it's seriously deviated from our original research questions. For the sake of being succinct, I'd write this idea down and we can pursue it in another setting."

"Sub-group analysis? Yes, but be prepared that it's going to be underpowered and please don't keep you hope too high."

"Sub-group analysis? But the interaction terms are not even significant and I can tell you to rest assure that the two groups will not show any difference."

"Another parameter? Another scenario? Sure, let's get this done with, once and for all. Let me know all possible parameters you want to try now. I will just loop through them."

"No, it's not related to our hypothesis."

"Would you like to follow up with that suggestion? I can send you the codes."

Blog

Friday, 10 August 2018

collaboration - I'm looking for an application or workflow which integrates data values into the writing process

No comments:

Post a Comment

evolution - Are there any multicellular forms of life which exist without consuming other forms of life in some manner?