bioRxiv, the preprint server for biology, recently turned 2 years old. This seems a good point to take a look at how bioRxiv has developed over this time and to discuss any concerns sceptical people may have about using the service.
Firstly, thanks to Richard Sever (@cshperspectives) for posting the data below. The first plot shows the number of new preprints deposited and the number that were revised, per month since bioRxiv opened in Nov 2013. There are now about 200 preprints being deposited per month and this number will continue to increase. The cumulative article count (of new preprints) shows that, as of the end of last month, there are >2500 preprints deposited at bioRxiv.
What is take up like across biology? To look at this, the number of articles in different subject categories can be totted up. Evolutionary Biology, Bioinformatics and Genomics/Genetics are the front-running disciplines. Obviously counting articles should be corrected for the size of these fields, but it’s clear that some large disciplines have not adopted preprinting in the same way. Cell biology, my own field, has some catching up to do. It’s likely that this reflects cultures within different fields. For example, genomics has a rich history of data deposition, sharing and openness. Other fields, less so…
So what are we waiting for?
I’d recommend that people wondering about preprinting go and read Stephen Curry’s post “just do it“. Any people who remain sceptical should keep reading…
Do I really want to deposit my best work on bioRxiv?
I’ve picked six preprints that were deposited in 2015. This selection demonstrates how important work is appearing first at bioRxiv and is being downloaded thousands of times before the papers appear in the pages of scientific journals.
- Accelerating scientific publishing in biology. A preprint about preprinting from Ron Vale, subsequently published in PNAS.
- Analysis of protein-coding genetic variation in 60,706 humans. A preprint summarising a huge effort from ExAC Exome Aggregation Consortium. 12,366 views, 4,534 downloads.
- TP53 copy number expansion correlates with the evolution of increased body size and an enhanced DNA damage response in elephants. This preprint was all over the news, e.g. Science.
- Sampling the conformational space of the catalytic subunit of human γ-secretase. CryoEM is the hottest technique in biology right now. Sjors Scheres’ group have been at the forefront of this revolution. This paper is now out in eLife.
- The genome of the tardigrade Hypsibius dujardini. The recent controversy over horizontal gene transfer in Tardigrades was rapidfire thanks to preprinting.
- CRISPR with independent transgenes is a safe and robust alternative to autonomous gene drives in basic research. This preprint concerning biosafety of CRISPR/Cas technology could be accessed immediately thanks to preprinting.
But many journals consider preprints to be previous publications!
Wrong. It is true that some journals have yet to change their policy, but the majority – including Nature, Cell and Science – are happy to consider manuscripts that have been preprinted. There are many examples of biology preprints that went on to be published in Nature (ancient genomes) and Science (hotspots in birds). If you are worried about whether the journal you want to submit your work to will allow preprinting, check this page first or the SHERPA/RoMEO resource. The journal “information to authors” page should have a statement about this, but you can always ask the Editor.
I’m going to get scooped
Preprints establish priority. It isn’t possible to be scooped if you deposit a preprint that is time-stamped showing that you were the first. The alternative is to send it to a journal where no record will exist that you submitted it if the paper is rejected, or sometimes even if they end up publishing it (see discussion here). Personally, I feel that the fear of scooping in science is overblown. In fields that are so hot that papers are coming out really fast the fear of scooping is high, everyone sees the work if its on bioRxiv or elsewhere – who was first is clear to all. Think of it this way: depositing a preprint at bioRxiv is just the same as giving a talk at a meeting. Preprints mean that there is a verifiable record available to everyone.
Preprints look ugly, I don’t want people to see my paper like that.
The depositor can format their preprint however they like! Check out Christophe Leterrier’s beautifully formatted preprint, or this one from Dennis Eckmeier. Both authors made their templates available so you can follow their example (1 and 2).
Yes but does -insert name of famous scientist- deposit preprints?
Lots of high profile scientists have already used bioRxiv. David Bartel, Ewan Birney, George Church, Ray Deshaies, Jennifer Doudna, Steve Henikoff, Rudy Jaenisch, Sophien Kamoun, Eric Karsenti, Maria Leptin, Rong Li, Andrew Murray, Pam Silver, Bruce Stillman, Leslie Vosshall and many more. Some sceptical people may find this argument compelling.
I know how publishing works now and I don’t want to disrupt the status quo
It’s paradoxical how science is all about pushing the frontiers, yet when it comes to publishing, scientists are incredibly conservative. Physics and Mathematics have been using preprinting as part of the standard route to publication for decades and so adoption by biology is nothing unusual and actually, we will simply be catching up. One vision for the future of scientific publishing is that we will deposit preprints and then journals will search out the best work from the server to highlight in their pages. The journals that will do this are called “overlay journals”. Sounds crazy? It’s already happening in Mathematics. Terry Tao, a Fields medal-winning mathematician recently deposited a solution to the Erdos discrepency problem on arXiv (he actually put them on his blog first). This was then “published” in Discrete Analysis, an overlay journal. Read about this here.
Disclaimer: other preprint services are available. F1000 Research, PeerJ Preprints and of course arXiv itself has quantitative biology section. My lab have deposited work at bioRxiv (1, 2 and 3) and I am an affiliate for the service, which means I check preprints before they go online.
Edit 14/12/15 07:13 put the scientists in alphabetical order. Added a part about scooping.
The post title comes from the term “white label” which is used for promotional vinyl copies of records ahead of their official release.
I was talking to a speaker visiting our department recently. While discussing his postdoc work from years ago, he told me about the identification of the sperm factor that causes calcium oscillations in the egg at fertilisation. It was an interesting tale because the group who eventually identified the factor – now widely accepted as PLCzeta – had earlier misidentified the factor, naming it oscillin.
As you can see there was intense interest in the first paper that quickly petered out, presumably when people found out that oscillin was a contaminant and not the real factor. The second paper on the other hand has attracted a large number of citations and continues to do so 12 years later – a sign of a classic paper. However, the initial spike in citations was not as high as the Nature paper.
The impact factor of Nature is much higher than that of Development. I’ve often wondered if this is due to a sociological phenomenon: people like to cite Cell/Nature/Science papers rather than those at other journals and this bumps up the impact factor. Before you comment, yes I know there are other reasons, but the IFs do not change much over time and I wonder whether journal hierarchy explains the hardiness of IFs over time. Anyway, these papers struck me as a good test of the idea… Here we have essentially the same discovery, reported by the same authors. The only difference here is the journal (and that one paper is six years after the other). Normally it is not possible to test if the journal influences citations because a paper cannot erased and republished somewhere else. The plot suggests that Nature papers inherently attract much more cites than those in Development, presumably because of the exposure of publishing there. From the graph, it’s not difficult to see that even if a paper turns out not to be right, it can still boost the IF of the journal during the window of assessment. Another reason not to trust journal impact factors.
I can’t think of any way to look at this more systematically to see if this phenomenon holds true. I just thought it was interesting, so I’ll leave it here.
The post title is taken from Half Right by Elliott Smith from the posthumous album New Moon. Bootlegs have the title as Not Half Right, which would also be appropriate.