I am doing a research of the processing times of papers published in journals in my field. I have noticed that the metrics that the journals advertise (e.g. the Elsevier journal insights) do not correspond to my experience, nor to the recently published papers, so I wanted to make my own survey. (My guess is that they take into account papers which are immediately rejected by the editor without being sent to a review, so the average looks quite favourable. I am more interested in the average time of the papers which are actually accepted.)
I plan to cover all recently (last 12 months) published papers in 10-20 journals of different publishers (e.g. Elsevier, T&F, Wiley), which will result in hundreds of papers. Basically, I will take the date when the paper was submitted, accepted, and published online, and calculate the average per journal.
Is there a way to automatically extract this information?
Answer
Have you checked this data is actually made available for your preferred journals? IME not all make their accepted/submitted/first-online dates very easily accessible, though it has improved a bit recently.
If it's there, your best bet is probably to screenscrape the HTML. Some journals provide nice clean XML to play with, but this is usually new online-only titles rather than legacy ones from traditional publishers.
Elsevier use a simple HTML tag (class="articleDates") which contains the core dates -
Received 23 March 2015, Revised 15 May 2015, Accepted 18 May 2015, Available online 9 June 2015
Taylor & Francis have similar information to Elsevier: the element you'd need is again "articleDates", but it unfortunately has a lot of linebreaks in it for no good reason!
Finally, Wiley don't seem to expose submitted/accepted dates (at least not for all journals); "publicationHistoryDetails" just gives first-online, which isn't much help.
No comments:
Post a Comment