bioRxiv, the preprint server for biology, recently turned 2 years old. This seems a good point to take a look at how bioRxiv has developed over this time and to discuss any concerns sceptical people may have about using the service.
Firstly, thanks to Richard Sever (@cshperspectives) for posting the data below. The first plot shows the number of new preprints deposited and the number that were revised, per month since bioRxiv opened in Nov 2013. There are now about 200 preprints being deposited per month and this number will continue to increase. The cumulative article count (of new preprints) shows that, as of the end of last month, there are >2500 preprints deposited at bioRxiv.
What is take up like across biology? To look at this, the number of articles in different subject categories can be totted up. Evolutionary Biology, Bioinformatics and Genomics/Genetics are the front-running disciplines. Obviously counting articles should be corrected for the size of these fields, but it’s clear that some large disciplines have not adopted preprinting in the same way. Cell biology, my own field, has some catching up to do. It’s likely that this reflects cultures within different fields. For example, genomics has a rich history of data deposition, sharing and openness. Other fields, less so…
So what are we waiting for?
I’d recommend that people wondering about preprinting go and read Stephen Curry’s post “just do it“. Any people who remain sceptical should keep reading…
Do I really want to deposit my best work on bioRxiv?
I’ve picked six preprints that were deposited in 2015. This selection demonstrates how important work is appearing first at bioRxiv and is being downloaded thousands of times before the papers appear in the pages of scientific journals.
- Accelerating scientific publishing in biology. A preprint about preprinting from Ron Vale, subsequently published in PNAS.
- Analysis of protein-coding genetic variation in 60,706 humans. A preprint summarising a huge effort from ExAC Exome Aggregation Consortium. 12,366 views, 4,534 downloads.
- TP53 copy number expansion correlates with the evolution of increased body size and an enhanced DNA damage response in elephants. This preprint was all over the news, e.g. Science.
- Sampling the conformational space of the catalytic subunit of human γ-secretase. CryoEM is the hottest technique in biology right now. Sjors Scheres’ group have been at the forefront of this revolution. This paper is now out in eLife.
- The genome of the tardigrade Hypsibius dujardini. The recent controversy over horizontal gene transfer in Tardigrades was rapidfire thanks to preprinting.
- CRISPR with independent transgenes is a safe and robust alternative to autonomous gene drives in basic research. This preprint concerning biosafety of CRISPR/Cas technology could be accessed immediately thanks to preprinting.
But many journals consider preprints to be previous publications!
Wrong. It is true that some journals have yet to change their policy, but the majority – including Nature, Cell and Science – are happy to consider manuscripts that have been preprinted. There are many examples of biology preprints that went on to be published in Nature (ancient genomes) and Science (hotspots in birds). If you are worried about whether the journal you want to submit your work to will allow preprinting, check this page first or the SHERPA/RoMEO resource. The journal “information to authors” page should have a statement about this, but you can always ask the Editor.
I’m going to get scooped
Preprints establish priority. It isn’t possible to be scooped if you deposit a preprint that is time-stamped showing that you were the first. The alternative is to send it to a journal where no record will exist that you submitted it if the paper is rejected, or sometimes even if they end up publishing it (see discussion here). Personally, I feel that the fear of scooping in science is overblown. In fields that are so hot that papers are coming out really fast the fear of scooping is high, everyone sees the work if its on bioRxiv or elsewhere – who was first is clear to all. Think of it this way: depositing a preprint at bioRxiv is just the same as giving a talk at a meeting. Preprints mean that there is a verifiable record available to everyone.
Preprints look ugly, I don’t want people to see my paper like that.
The depositor can format their preprint however they like! Check out Christophe Leterrier’s beautifully formatted preprint, or this one from Dennis Eckmeier. Both authors made their templates available so you can follow their example (1 and 2).
Yes but does -insert name of famous scientist- deposit preprints?
Lots of high profile scientists have already used bioRxiv. David Bartel, Ewan Birney, George Church, Ray Deshaies, Jennifer Doudna, Steve Henikoff, Rudy Jaenisch, Sophien Kamoun, Eric Karsenti, Maria Leptin, Rong Li, Andrew Murray, Pam Silver, Bruce Stillman, Leslie Vosshall and many more. Some sceptical people may find this argument compelling.
I know how publishing works now and I don’t want to disrupt the status quo
It’s paradoxical how science is all about pushing the frontiers, yet when it comes to publishing, scientists are incredibly conservative. Physics and Mathematics have been using preprinting as part of the standard route to publication for decades and so adoption by biology is nothing unusual and actually, we will simply be catching up. One vision for the future of scientific publishing is that we will deposit preprints and then journals will search out the best work from the server to highlight in their pages. The journals that will do this are called “overlay journals”. Sounds crazy? It’s already happening in Mathematics. Terry Tao, a Fields medal-winning mathematician recently deposited a solution to the Erdos discrepency problem on arXiv (he actually put them on his blog first). This was then “published” in Discrete Analysis, an overlay journal. Read about this here.
Disclaimer: other preprint services are available. F1000 Research, PeerJ Preprints and of course arXiv itself has quantitative biology section. My lab have deposited work at bioRxiv (1, 2 and 3) and I am an affiliate for the service, which means I check preprints before they go online.
Edit 14/12/15 07:13 put the scientists in alphabetical order. Added a part about scooping.
The post title comes from the term “white label” which is used for promotional vinyl copies of records ahead of their official release.
Our most recent manuscript was almost ready for submission. We were planning to send it to an open access journal. It was then that I had the thought: how many papers in the reference list are freely available?
It somehow didn’t make much sense to point readers towards papers that they might not be able to access. So, I wondered if there was a quick way to determine how papers in my reference list were open access. I asked on twitter and got a number of suggestions:
- Search crossref to find out if the journal is in DOAJ (@epentz)
- How Open Is It? from Cottage Labs will check a list of DOIs (up to 20) for openness (@emanuil_tolev)
- Open access DOI Resolver will perform a similar task (@neurocraig)
I actually used a fourth method (from @biochemistries and @invisiblecomma) which was to use HubMed, although in the end a similar solution can be reached by searching PubMed itself. Whereas the other strategies will work for a range of academic texts, everything in my reference list was from PubMed. So this solution worked well for me. I pulled out the list of Accessions (PMIDs) for my reference list. This was because some papers were old and I did not have their DOIs. The quickest way to do this was to make a new EndNote style that only contained the field Accession and get it to generate a new bibliography from my manuscript. I appended
[uid] OR after each one and searched with that term.
My paper had 44 references. Of these, 35 were freely available to read. I was actually surprised by how many were available. So, 9 papers were not free to read. As advised, I checked each one to really make sure that the HubMed result was accurate, and it was.
Please note that I’d written the paper without giving this a thought and citing papers as I normally do: the best demonstration of something, the first paper to show something, using primary papers as far as possible.
Seven of the nine I couldn’t compromise on. They’re classic papers from 80s and 90s that are still paywalled but are unique in what they describe. However, two papers were reviews in closed access journals. Now these I could do something about! Especially as I prefer to cite the primary literature anyway. Plus, most reviews are pretty unoriginal in what they cover and an alternative open access version that is fairly recent can easily be found. I’ll probably run this check for future manuscripts and see what it throws up.
It’s often said that papers are our currency in science. The valuation of this currency comes from citations. Funnily enough, we the authors are in a position to actually do something about this. I don’t think any of us should compromise the science in our manuscripts. However, I think we could all probably pay a bit more attention to the citations that we dish out when writing a paper. Whether this is simply to make sure that what we cite is widely accessible, or just making sure that credit goes to the right people.
The post title is taken from “To Open Closed Doors” by D.R.I. from the Dirty Rotten LP