bibliometrics - How many people read an individual journal article?

Tuesday, 25 July 2017

bibliometrics - How many people read an individual journal article?

General background

Some time ago, I was reading a blog post, where there was some discussion about how many people read journal articles. I think that such an estimate is important when trying to assess the impact of research on society. However, whereas internet sites readily track usage. Such information seems a little more difficult to come by when it comes to readership for a particular journal article.

Initial Ideas

Articles vary: Obviously journal articles vary in many ways and just as with citation counts, readership is likely to be highly skewed, perhaps something like a power function. In addition to academic impact, presumably articles that are available for free on the internet are read more.

Time since publication: The number of reads increases over time, but the rate of readership presumably varies over time (perhaps a spike on initial release, and then gradual decline as relevance dissipates).

Definitions of reading vary: Read counts would also increase or decrease based on how reading is defined. At the low end is a glance at an abstract. At the high end is carefully reading the entire article. I'd be happy with a working definition that involved reading at least two pages.

Initial Data

PlosOne article statistics: As a very rough guide, it suggests that mean views per article is around 800 per year.

Journal of Vision: this article reports some download statistics: "In the most recent accounting in July, 2008, the top five articles were each downloaded between 1,993 and 3,478 times."

Some journals list subscription counts

Initial Guess

I find it useful to have a ball park estimate of these things. My own initial guess, based on minimal data, is that readership is between 50 and 1000 times the citation count for the article. Linking the estimate to citation count makes it easier to estimate for a given article and should incorporate effects like time and journal prestige.

Question

What is a good estimate of how many people read a given journal article?

What data and sources of information justify this estimate?

Is there any established literature that can inform such an estimate?

Answer

Sounds like a Fermi Problem :)

A question I asked myself recently, based on the many cases of plagiarism by top-politicians in Germany in humanities, was, are in humanities more articles/texts published than scholars can actually read completly. The amount of copied text in single phd thesis showed by plagiarism-detection communities in Germany like Vroniplag or Guttenplag is shocking to me. Often 50% of text is not marked correctly as citation. Even the supervisors at the local universities look like they never read some of these thesis completly. I really hope this is not representative, but fear it might be the tip of the iceberg in humanities (in Germany).

Personally, coming and working in a STEM field, I did a very specialized thesis, there are often less than a dozen groups worldwide working on such a narrow-specialized topic (matter of scientific competition/finding a niche, time, expertise and lab hardware in such fields). So there will be articles in peer-reviewed journals that are not really interesting to more than 20-50 researcher and probably a similar number of industry-researchers worldwide in STEM (competition between companies and research groups being not that different due to economic contraints). Without modern search engines, most non-scholars/private men would have a hard time to find such articles. This is another point in your estimation. The reader count for nature/sciene vs. very specialized journals varies a lot, I don't think any average number really helps you a lot or is that interesting. If you know your specialized field, you should notice pretty fast studying some journals, how many scholars have really a interest in that field.

Your PlosOne link is interesting. I can back this up a bit to give you at least a rough magnitude of order, what the reader count of top, specialized, ... journals is. I think it's quite normal, to read articles not completely (even if you cite them), but I take a close look on articles I downloaded, often due to the fact that I use many keywords and google operators to really filter out the stuff I'm looking for. This is something that varies also a lot between different scholars/students. I'm often shocked how students make use of search engines, if it is laziness or ignorance of search operators. This can save you so much reading time. Therefore, I think the extrapolated reader count based on citation factor might be more representative and reliable than using site views/downloads due to scholars, private people, laymen often downloading articles with information they didn't look for because of bad search engine use. Growing redundancy/plagiarism is a further factor here.

Some possible heuristics:

comparison of published aricles per month and web site/interface visitors per month on download platforms like PlosOne, arxiv, nature.

arxiv has around 6000 published articels per month, unique visits 100000, 12,4 million downloads by academic institutions, 50 million overall vs. 12x6000 articles 2011 means downloads/view of abstract of around 170 (I used 12,4 million here), of course, that doesnt count articles not published in that year, so the average read count of a single arxiv article is probably lower than 170 and more touching the 20-50 mark I explained above. But here you have IMO a reasonable and quite objective minimum and maximum limit for a scientific article other scholars are really interested in, 50-170

nature has 900000 unique visits per month, around 200 articles per month, so you see why having an article published in nature is probably more worth than 10 articles on arxiv, PlosOne or many other specialized journals in a distinct branch, even if they are peer reviewed ;)

looking up bibliographies of a some phd thesis in your field at your local university, the number of cited articles is in STEM often in the range of 50-200 (You see even here it varies a lot what a single phd student will/has to read). Of course you do not cite all articles you read, but the factor shouldn't be higher than 2 between (or your search engine use is imho suboptimal) cited and read articles. Considering the phd student will publish 3-5 (in STEM reasonable number or 1 nature article :) ) articles during his phd work and multiplying 3-5*20-50 (average read count by institutional scholars) you also get the number of articles in a phd thesis bibliography of 50-200. Pure Chance?! Looks like a strange calculation, but there is a link between how much article input a average scholar needs and how much output he creates (thats why I multiply both values) and it strengthens my experience/analysis above that 10-100 readers is a reasonable magnitude of order for people being really interested in an single average article. To me it doesn't look like pure chance, but that's the main problem with Fermi questions and answers :)

PS: notice this analysis is focused on STEM, I believe the average read count is much lower in humanities and side-effects like different languages and plagiarism seem to play a bigger role to make a really objective guesstimate

Blog

Tuesday, 25 July 2017