Like so many others, I scrape data from Google Scholar as a part of my lit review process, so that I can have a structured data set for meta-analysis of the literature.
I noticed that for a couple of many topics of interest, the # of articles per year seems to be increasing until 2017, then it drops off sharply.
I wonder if it's really safe to assume that fewer articles were published in 2018?
Is it possible that this means that the data through 2017 is relatively "complete", whereas journals and authors for 2018 may still be in the process of being added to Google's index, thus the total number is under-reported?
Has anyone encountered this?
Answer
Google Scholar has its strong points (e.g. indexing of grey literature that is not available in any regular scholarly database), but data quality is not one of them. Of course, this is not because Google lacks the ability to create a high quality database; it is rather because publishers refuse to grant it permission to create a high-quality database that it distributes for free. Google's index is based on Google Scholar's web spider whose completeness depends on what is available from public websites (Google strictly respects websites permissions; it makes no attempt to index anything where the websites ask it not to do so with a robots.txt entry). I would not be surprised if some publishers restrict Google's permission to index details of some of their most recent publications.
With that perspective, then for any given topic, if there is a sharp dropoff during or after 2017 (its unclear which is the case the way you worded the question), I would not consider that evidence of anything. That is, it is not necessarily evidence that people suddenly stopped publishing on that topic; it is only evidence that Google's index no longer contains that topic, for whatever reason. I know that I've seen quite a few articles that have charts like that and make claims like that, but I don't consider such claims reliable. (And when I peer-review articles that make such claims, I tell the authors so.)
To make any concrete, serious claim about change in publishing patterns of topics, you would need a more rigorous and systematic database source (such as Web of Knowledge, Scopus, etc.) and at least a two-year lag to make sure that all data is complete.
No comments:
Post a Comment