Wednesday 13 December 2017

publications - Do all preprint servers have the non-updating issue in google scholar


It has been identified on stackexchange that Google Scholar has a bug where ArXiv publications are not updated to match the final journal publication information in google scholar (for several months). And the answer claims that google has stated that fixing this bug is not a priority for them.



My question is: Does this bug exist for all preprint servers or just ArXiv? This a huge problem in my field (where the convention is to not cite ArXiv papers), so I'd rather post my pre-print elsewhere if this problem does not exist for other preprint servers.


BioArXiv and ResearchGate come to mind as possibilities.



Answer



I've been following the Google Scholar Preprint Bug for several years. Claus Wilke (who first characterized the bug in 2014) and I tried reporting it many times. Finally, we caught up with the creator of Google Scholar, Anurag Acharya, in the comment section of Scholarly Kitchen. I was a jerk, which didn't help, but Anurag's responses kept us feeling frustrated.


Anyways, Anurag did shed some light on the cause of the bug in this comment:



Most preprints/ahead-of-print versions are indexed in early-version model — as they should be. Articles that are indexed in early-version mode are recrawled and reindexed frequently. Changes to their location, their content, their format, their versions are expected to be frequent; this allows changes to be picked up soon.


Occasionally, a preprint that has been in that state for a while can get indexed in the archival mode. When that happens, updates to that article (location, content, format, versions etc) take longer. Articles that are indexed in an archival mode are reindexed less frequently – as they must, if the indexing system is to use the limited crawl capacity at the journal sites effectively.



Therefore, the bug occurs when a record (preprint in our case) is stagnant for too long. Then that record is placed in archival mode, which triggers the bug, by causing newly crawled versions of the record to be silently ignored. Therefore, according to Anurag's description, the bug is not limited to any single preprint server.



In line with the description, it appears longer publication delays increase the chance of the bug. Perhaps a workaround is to frequently update the preprint so the record never enters archival mode.


No comments:

Post a Comment

evolution - Are there any multicellular forms of life which exist without consuming other forms of life in some manner?

The title is the question. If additional specificity is needed I will add clarification here. Are there any multicellular forms of life whic...