Following on from the last post about publication lag times at cell biology journals, I went ahead and crunched the numbers for all journals in PubMed for one year (2013). Before we dive into the numbers, a couple of points about this kind of information.
- Some journals “reset the clock” on the received date with manuscripts that are resubmitted. This makes comparisons difficult.
- The length of publication lag is not necessarily a reflection of the way the journal operates. As this comment points out, manuscripts are out of the journals hands (with the reviewers) for a substantial fraction of the time.
- The dataset is incomplete because the deposition of this information is not mandatory. About 1/3 of papers have the date information deposited (see below).
- Publication lag times go hand-in-hand with peer review. Moving to preprints and post-publication review would eradicate these delays.
Thanks for all the feedback on my last post, particularly those that highlighted the points above.
To see how all this was done, check out the Methods bit below, where you can download the full summary. I ended up with a list of publication lag times for 428500 papers published in 2013 (see left). To make a bit more sense of this, I split them by journal and then found the publication lag time stats for each. This had to be done per journal since PLoS ONE alone makes up 45560 of the records.
To try and visualise what these publication lag times look like for all journals, I made a histogram of the Median lag times for all journals using a 10 d bin width. It takes on average ~100 d to go from Received to Accepted and a further ~120 d to go from Accepted to Published. The whole process on average takes 239 days.
To get a feel for the variability in these numbers I plotted out the ranked Median times for each journal and overlaid Q25 and Q75 (dots). The IQR for some of the slower journals was >150 d. So the papers that they publish can have very different fates.
Is the publication lag time longer at higher tier journals? To look at this, I used the Rec-Acc time and the 2013 Journal Impact Factor which, although widely derided and flawed, does correlate loosely with journal prestige. I have fewer journals in this dataset, because the lookup of JIFs didn’t find every journal in my starting set, either because the journal doesn’t have one or there were minor differences in the PubMed name and the Thomson-Reuters name. The median of the median Rec-Acc times for each bin is shown. So on average, journals with a JIF <1 will take 1 month longer to accept your paper than journal with an IF ranging from 1-10. After this it rises again, to ~2 months longer at journals with an IF over 10. Why? Perhaps at the lower end, the trouble is finding reviewers; whereas at the higher end, multiple rounds of review might become a problem.
The executive summary is below. These are the times (in days) for delays at all journals in PubMed for 2013.
- Median time from ovulation to birth of a human being is 268 days.
- Mark Beaumont cycled around the world (29,446 km) in 194 days.
- Ellen MacArthur circumnavigated the globe single-handed in 72 days.
On the whole it seems that publishing in Cell Biology is quite slow compared to the whole of PubMed. Why this is the case is a tricky question. Is it because cell biologists submit papers too early and they need more revision? Are they more dogged in sending back rejected manuscripts? Is it because as a community we review too harshly and/or ask too much of the authors? Do Editors allow too many rounds of revision or not give clear guidance to expedite the time from Received-to-Accepted? It’s probably a combination of all of these factors and we’re all to blame.
Finally, this amusing tweet to show the transparency of EMBO J publication timelines raises the question: would these authors have been better off just sending the paper somewhere else?
Methods: I searched PubMed using
journal article[pt] AND ("2013/01/01"[PDAT] : "2013/12/31"[PDAT]) this gave a huge xml file (~16 GB) which nokogiri balked at. So I divided the query up into subranges of those dates (1.4 GB) and ran the script on all xml files. This gave 1425643 records. I removed records that did not have a received date or those with greater than 12 in the month field (leaving 428513 records). 13 of these records did not have a journal name. This gave 428500 records from 3301 journals. Again, I filtered out negative values (papers accepted before they were received) and a couple of outliers (e.g. 6000 days!). With a bit of code it was quite straightforward to extract simple statistics for each of the journals. You can download the data here to look up the information for a journal of your choice (wordpress only allows xls, not txt/csv). The fields show the journal name and the number of valid articles. Then for Acc-Pub, Rec-Acc and Rec-Pub, the number, Median, lower quartile, upper quartile times in days are given. I set a limit of 5 or more articles for calculation of the stats. Blank entries are where there was no valid data. Note that there are some differences with the table in my last post. This is because for that analysis I used a bigger date range and then filtered the year based on the published field. Here my search started out by specifying PDAT, which is slightly different.
The data are OK, but the publication date needs to be taken with a pinch of salt. For many records it was missing a month or day, so the date used for some records is approximate. In retrospect using the Entrez date or one of the other required fields would have probably be better. I liked the idea of the publication date as this is when the paper finally appears in print which still represents a significant delay at some journals. The Recieved-to-Accepted dates are valid though.
My interest in publication lag times continues. Previous posts have looked at how long it takes my lab to publish our work, how often trainees publish and I also looked at very long lag times at Oncogene. I recently read a blog post on automated calculation of publication lag times for Bioinformatics journals. I thought it would be great to do this for Cell Biology journals too. Hopefully people will find it useful and can use this list when thinking about where to send their paper.
What is publication lag time?
If you are reading this, you probably know how science publication works. Feel free to skip. Otherwise, it goes something like this. After writing up your work for publication, you submit it to a journal. Assuming that this journal will eventually publish the paper (there is usually a period of submitting, getting rejected, resubmitting to a different journal etc.), they receive the paper on a certain date. They send it out to review, they collate the reviews and send back a decision, you (almost always) revise your paper further and then send it back. This can happen several times. At some point it gets accepted on a certain date. The journal then prepares the paper for publication in a scheduled issue on a specific date (they can also immediately post papers online without formatting). All of these steps add significant delays. It typically takes 9 months to publish a paper in the biomedical sciences. In 2015 this sounds very silly, when world-wide dissemination of information is as simple as a few clicks on a trackpad. The bigger problem is that we rely on papers as a currency to get jobs or funding and so these delays can be more than just a frustration, they can affect your ability to actually do more science.
The good news is that it is very straightforward to parse the received, accepted and published dates from PubMed. So we can easily calculate the publication lags for cell biology journals. If you don’t work in cell biology, just follow the instructions below to make your own list.
The bad news is that the deposition of the date information in PubMed depends on the journal. The extra bad news is that three of the major cell biology journals do not deposit their data: J Cell Biol, Mol Biol Cell and J Cell Sci. My original plan was to compare these three journals with Traffic, Nat Cell Biol and Dev Cell. Instead, I extended the list to include other journals which take non-cell biology papers (and deposit their data).
A summary of the last ten years
Three sets of box plots here show the publication lags for eight journals that take cell biology papers. The journals are Cell, Cell Stem Cell, Current Biology, Developmental Cell, EMBO Journal, Nature Cell Biology, Nature Methods and Traffic (see note at the end about eLife). They are shown in alphabetical order. The box plots show the median and the IQR, whiskers show the 10th and 90th percentiles. The three plots show the time from Received-to-Published (Rec-Pub), and then a breakdown of this time into Received-to-Accepted (Rec-Acc) and Accepted-to-Published (Rec-Pub). The colours are just to make it easier to tell the journals apart and don’t have any significance.
You can see from these plots that the journals differ widely in the time it takes to publish a paper there. Current Biology is very fast, whereas Cell Stem Cell is relatively slow. The time it takes the journals to move them from acceptance to publication is pretty constant. Apart from Traffic where it takes an average of ~3 months to get something in to print. Remember that the paper is often online for this period so this is not necessarily a bad thing. I was not surprised that Current Biology was the fastest. At this journal, a presubmission inquiry is required and the referees are often lined up in advance. The staff are keen to publish rapidly, hence the name, Current Biology. I was amazed at Nature Cell Biology having such a short time from Received-to-Acceptance. The delay in Review-to-Acceptance comes from multiple rounds of revision and from doing extra experimental work. Anecdotally, it seems that the review at Nature Cell Biol should be just as lengthy as at Dev Cell or EMBO J. I wonder if the received date is accurate… it is possible to massage this date by first rejecting the paper, but allowing a resubmission. Then using the resubmission date as the received date [Edit: see below]. One way to legitimately limit this delay is to only allow a certain time for revisions and only allow one round of corrections. This is what happens at J Cell Biol, unfortunately we don’t have this data to see how effective this is.
How has the lag time changed over the last ten years?
Have the slow journals always been slow? When did they become slow? Again three plots are shown (side-by-side) depicting the Rec-Pub and then the Rec-Acc and Acc-Pub time. Now the intensity of red or blue shows the data for each year (2014 is the most intense colour). Again you can see that the dataset is not complete with missing date information for Traffic for many years, for example.
Interestingly, the publication lag has been pretty constant for some journals but not others. Cell Stem Cell and Dev Cell (but not the mothership – Cell) have seen increases as have Nature Cell Biology and Nature Methods. On the whole Acc-Pub times are stable, except for Nature Methods which is the only journal in the list to see an increase over the time period. This just leaves us with the task of drawing up a ranked list of the fastest to the slowest journal. Then we can see which of these journals is likely to delay dissemination of our work the most.
The Median times (in days) for 2013 are below. The journals are ranked in order of fastest to slowest for Received-to-Publication. I had to use 2013 because EMBO J is missing data for 2014.
|Nature Cell Biol||237||180||59|
|Cell Stem Cell||284||205||66|
You’ll see that only Cell Stem Cell is over the threshold where it would be faster to conceive and give birth to a human being than to publish a paper there (on average). If the additional time wasted in submitting your manuscript to other journals is factored in, it is likely that most papers are at least on a par with the median gestation time.
If you are wondering why eLife is missing… as a new journal it didn’t have ten years worth of data to analyse. It did have a reasonably complete set for 2013 (but Rec-Acc only). The median time was 89 days, beating Current Biology by 10.5 days.
Please check out Neil Saunders’ post on how to do this. I did a PubMed search for
(journal1[ta] OR journal2[ta] OR ...) AND journal article[pt] to make sure I didn’t get any reviews or letters etc. I limited the search from 2003 onwards to make sure I had 10 years of data for the journals that deposited it. I downloaded the file as xml and I used Ruby/Nokogiri to parse the file to csv. Installing Nokogiri is reasonably straightforward, but the documentation is pretty impenetrable. The ruby script I used was from Neil’s post (step 3) with a few lines added:
#!/usr/bin/ruby require 'nokogiri' f = File.open(ARGV.first) doc = Nokogiri::XML(f) f.close doc.xpath("//PubmedArticle").each do |a| r = ["", "", "", "", "", "", "", "", "", "", ""] r = a.xpath("MedlineCitation/Article/Journal/ISOAbbreviation").text r = a.xpath("MedlineCitation/PMID").text r = a.xpath("PubmedData/History/PubMedPubDate[@PubStatus='received']/Year").text r = a.xpath("PubmedData/History/PubMedPubDate[@PubStatus='received']/Month").text r = a.xpath("PubmedData/History/PubMedPubDate[@PubStatus='received']/Day").text r = a.xpath("PubmedData/History/PubMedPubDate[@PubStatus='accepted']/Year").text r = a.xpath("PubmedData/History/PubMedPubDate[@PubStatus='accepted']/Month").text r = a.xpath("PubmedData/History/PubMedPubDate[@PubStatus='accepted']/Day").text r = a.xpath("MedlineCitation/Article/Journal/JournalIssue/Pubdate/Year").text r = a.xpath("MedlineCitation/Article/Journal/JournalIssue/Pubdate/Month").text r = a.xpath("MedlineCitation/Article/Journal/JournalIssue/Pubdate/Day").text puts r.join(",") end
and then executed as described. The csv could then be imported into IgorPro and processed. Neil’s post describes a workflow for R, or you could use Excel or whatever at this point. As he notes, quite a few records are missing the date information and some of it is wrong, i.e. published before it was accepted. These need to be cleaned up. The other problem is that the month is sometimes an integer and sometimes a three-letter code. He uses lubridate in R to get around this, a loop-replace in Igor is easy to construct and even Excel can handle this with an IF statement, e.g.
IF(LEN(G2)=3,MONTH(1&LEFT(G2,3)),G2) if the month is in G2. Good luck!
Edit 9/3/15 @ 17:17 several people (including Deborah Sweet and Bernd Pulverer from Cell Press/Cell Stem Cell and EMBO, respectively) have confirmed via Twitter that some journals use the date of resubmission as the submitted date. Cell Stem Cell and EMBO journals use the real dates. There is no way to tell whether a journal does this or not (from the deposited data). Stuart Cantrill from Nature Chemistry pointed out that his journal do declare that they sometimes reset the clock. I’m not sure about other journals. My own feeling is that – for full transparency – journals should 1) record the actual dates of submission, acceptance and publication, 2) deposit them in PubMed and add them to the paper. As pointed out by Jim Woodgett, scientists want the actual dates on their paper, partly because they are the real dates, but also to claim priority in certain cases. There is a conflict here, because journals might appear inefficient if they have long publication lag times. I think this should be an incentive for Editors to simplify revisions by giving clear guidance and limiting successive revision cycles. (This Edit was corrected 10/3/15 @ 11:04).
The post title is taken from “Waiting to Happen” by Super Furry Animals from the “Something 4 The Weekend” single.
The transition for scientific journals from print to online has been slow and painful. And it is not yet complete. This week I got an RSS alert to a “new” paper in Oncogene. When I downloaded it, something was familiar… very familiar… I’d read it almost a year ago! Sure enough, the AOP (ahead of print or advance online publication) date for this paper was September 2013 and here it was in the August 2014 issue being “published”.
I wondered why a journal would do this. It is possible that delaying actual publication would artificially boost the Impact Factor of a journal because there is a delay before citations roll in and citations also peak after two years. So if a journal delays actual publication, then the Impact Factor assessment window captures a “hotter” period when papers are more likely to generate more citations*. Richard Sever (@cshperspectives) jumped in to point out a less nefarious explanation – the journal obviously has a backlog of papers but is not allowed to just print more papers to catch up, due to page budgets.
There followed a long discussion about this… which you’re welcome to read. I was away giving a talk and missed all the fun, but if I may summarise on behalf of everybody: isn’t it silly that we still have pages – actual pages, made of paper – and this is restricting publication.
I wondered how Oncogene got to this position. I retrieved the data for AOP and actual publication for the last five years of papers at Oncogene excluding reviews, from Pubmed. Using
oncogene[ta] NOT review[pt] as a search term. The field DP has the date published (the “issue date” that the paper appears in print) and PHST has several interesting dates including [aheadofprint]. These could be parsed and imported into IgorPro as 1D waves. The lag time from AOP to print could then be calculated. I got 2916 papers from the search and was able to get data for 2441 papers.
You can see for this journal that the lag time has been stable at around 300 days (~10 months) for issues published since 2013. So a paper AOP in Feb 2012 had to wait over 10 months to make it into print. This followed a linear period of lag time growth from mid-2010.
I have no links to Oncogene and don’t particularly want to single them out. I’m sure similar lags are happening at other print journals. Actually, my only interaction with Oncogene was that they sent this paper of ours out to review in 2011 (it got two not-negative-but-admittedly-not-glowing reviews) and then they rejected it because they didn’t like the cell line we used. I always thought this was a bizarre decision: why couldn’t they just decide that before sending it to review and wasting our time? Now, I wonder whether they were not keen to add to their increasing backlog of papers at their journal? Whatever the reason, it has put me off submitting other papers there.
I know that there are good arguments for continuing print versions of journals, but from a scientist’s perspective the first publication is publication. Any subsequent versions are simply redundant and confusing.
*Edit: Alexis Verger (@Alexis_Verger) pointed me to a paper which describes that, for neuroscience journals, the lag time has increased over time. Moreover, the authors suggest that this is for the purpose of maximising Journal Impact Factor.
The post title comes from the double A-side Fools Gold/What The World Is Waiting For by The Stone Roses.
How long does it take to publish a paper?
The answer is – in our experience, at least – about 9 months.
That’s right, it takes about the same amount of time to have a baby as it does to publish a scientific paper. Discussing how we can make the publication process quicker is for another day. Right now, let’s get into the numbers.
The graphic shows the time taken from submission-to-publication for papers on which I am an author. I’m missing data for two papers (one from 1999 and one from 2002) and the Biol Open paper is published online but not yet “in print”, but mostly the information is complete. If you want to calculate this for your own papers; my advice would be to keep a spreadsheet of submission and decision dates as you go along… and archive your emails.
In the last analysis, a few people pointed out ways that the graphic could be improved, and I’ve now implemented these changes.
The graphic shows that the journey to publication is in four eras:
- Pre-time (before 0 on the x-axis): this is the time from first submission to the first journal. A dark time which involves rejection.
- Submission at the final journal (starting at time 0). Again, the orange periods are when the manuscript is with the journal and the green, when it is with us. Needless to say this green time is mainly spent doing experimental work (compare green periods for reviews and for papers)
- Acceptance! This is where the orange bar stops. The manuscript is then readied for publication (blank area).
- Published online. A purple period that ends with final publication in print.
Note that: i) the delays are more-or-less negated by preprinting provided deposition is before the first submission (grey line, for Biol Open paper), ii) these delay diagrams do not take into account the original drafting/rewriting cycle before the fist submission – nor the time taken to do the work!
So… how long does it take to publish a paper?
In the top right graph: the time from first submission to being published online is 250 days on average (median). This is shown by the blue bar. If we throw in the average time it takes to go from online to print (15 days) this gives 265 days. The average time for human gestation is 266 days. So it takes about the same amount of time to have a baby as it does to publish a paper! By contrast, reviews take only 121 days, equivalent to four lunar cycles (118 days).
My 2005 paper at Nature holds the record for the most protracted publication 399 days from submission to publication. The fastest publication is the most recent, our Biol Open paper was online 49 days after submission (it was also online 1 day before submission as a preprint).
In the bottom right graph: I added together the total time each paper was either with the journal, or with us, and plotted the average. The time from acceptance-to-publication online is shown stacked onto the “time with journal” column. You can see from this graphic that the lion’s share of the delay comes from revisions that we must do in order for a paper to be published. Multiple revisions and submissions also push these numbers up compared to the totals for reviews.
How representative are these numbers?
This is a small dataset at many different journals and so it is difficult to conclude much. With this analysis, I was hoping to identify ‘slow journals’ that we should avoid and also to think about our publication strategy (as much as a crap shoot can have a strategy). The whole process is stochastic and I don’t see any reason to change the way that we navigate the system. Having said this, I can’t see us doing any more methods/book chapters, as they are just so slow.
Just over half of our papers have some “pre-time”, i.e. they got rejected from at least one other journal before finding a home. A colleague of mine likes to say:
“if your paper is accepted at the first journal you send it to, you sent it to the wrong place”
One thing for sure is that publication takes a long time. And I don’t think our experience is uncommon. The pace of scientific publishing has been described as glacial by Leslie Vosshall and I don’t disagree with this. I think the 9 months figure is probably representative for most areas of biology. I know that other scientists in my field, who have more tenacity for rejections and for slugging it out at high impact journals, have much longer times from 1st submission to acceptance. In my opinion, wasting even more time chasing publication is crazy, counter-productive and demotivating for the people in the lab.
The irony in all this is that, even though we are working at the absolute bleeding edge of science with all of this technology at our disposal, our methods for reporting science are badly out of date. And with that I’ll push the “publish” button and this will be online…
The title of this post comes from ‘Some Things Last A Long Time’ by Daniel Johnston from his LP ‘1990’.
How long does it take to publish a paper?
I posted the picture below on Twitter to show how long it takes for us to publish a paper.
The answer is 235 days. This is the median time from submission at the first journal to publication online or in print. The data are from our last ten papers.
The infographic proved popular with 40 retweets and 22 favourites. It was pointed out to me that the a few things would improve this visualisation:
1. Showing the names of the journals
2. Showing when the 1st submission was relative to the 1st submission at the journal that finally accepted the paper
3. What about reviews and other types of publication.
I am working on updating the graph to show all of these things… watch this space.
My point was really to show (perhaps to non-scientists) how long the process of publishing a paper can be. There is other information that can be gleaned from this, e.g. what proportion of time is at the journal’s side and how much is at our end?
The people who are eager to see which journals perform badly (slowly) will be disappointed: this is a very small subset of papers from one lab. I’d be interested in scraping the information on journal tardiness on a larger scale and synthesising this so that it can inform journal choice. Recently though major publishers have taken steps to make this information less accessible so don’t hold your breath.
The title of this post is from So Long by Cian Ciarán from the LP ‘Outside In’