Tuesday 31 July 2018

publications - How can editors and reviewers detect data manipulation?


I am preparing a paper in the field of Computer Science.


In order to report test results, we usually run a number of tests and report the average of those tests.


For each test, we generate random data.


Because of the randomness, at some points, the results may come out not as expected.



For instance, a graph may be like: enter image description here


Usually, one should explain why on points 8, 11 and 12 there is a decrease on the plot. Probably, it is because of that randomness.


Not hand-crafting all the graph, but just manipulating a few points makes the graph acceptable: enter image description here


Since three weeks or so, I work my ass off and try to figure out why my resulting graph looks like the first one. Sometimes I feel like yielding to temptation and just modify the raw data before I go crazy.


I believe, at this point the title became misleading, so let me make it clear:


I am not seeking an advice on data manipulation. I will not manipulate my data. However, I ask to myself "how the hell this can be detected?"


And now, I don't only ask to myself, but to whole community. How is this detected? For editors, referees out there, have you ever detected something like this?



Answer



The image manipulations reported on Retraction Watch are most of the time naive collages of gel photographs or spectrograms. They get caught, among other things, because repeating patterns in the noise appear on closer inspection, or linear disruption of the noise are visible, see this.


For 1D data, the case you mention, there is the Benford's law and other statistical tests that can indicate potential manipulation of data. It usually relies on human beings preferring certain digits over others, even unconsciously, thus generating data that has a non-random variability.



Also, many journals ask for graphs to be submitted in vector format, which means you are actually sending the data points, and not just a rendered figure. Things like editing out a few data points to smooth a curve will be apparent.


Now, to the best of my knowledge publishers and, even less so, reviewers don't systematically screen for these things, they only do so if they have suspicions, because the scientific publishing process is based on good faith. But if the paper gets any sort of attention it will get caught by post publication review.


Don't fabricate/manipulate data. It's adding unwanted noise to an already noisy signal, it's dishonest towards your coworkers, the people who fund you, the publisher and the readership, and it will ruin your career.


No comments:

Post a Comment

evolution - Are there any multicellular forms of life which exist without consuming other forms of life in some manner?

The title is the question. If additional specificity is needed I will add clarification here. Are there any multicellular forms of life whic...