White label: the growth of bioRxiv

bioRxiv, the preprint server for biology, recently turned 2 years old. This seems a good point to take a look at how bioRxiv has developed over this time and to discuss any concerns sceptical people may have about using the service.

Firstly, thanks to Richard Sever (@cshperspectives) for posting the data below. The first plot shows the number of new preprints deposited and the number that were revised, per month since bioRxiv opened in Nov 2013. There are now about 200 preprints being deposited per month and this number will continue to increase. The cumulative article count (of new preprints) shows that, as of the end of last month, there are >2500 preprints deposited at bioRxiv. overall2


What is take up like across biology? To look at this, the number of articles in different subject categories can be totted up. Evolutionary Biology, Bioinformatics and Genomics/Genetics are the front-running disciplines. Obviously counting articles should be corrected for the size of these fields, but it’s clear that some large disciplines have not adopted preprinting in the same way. Cell biology, my own field, has some catching up to do. It’s likely that this reflects cultures within different fields. For example, genomics has a rich history of data deposition, sharing and openness. Other fields, less so…

So what are we waiting for?

I’d recommend that people wondering about preprinting go and read Stephen Curry’s post “just do it“. Any people who remain sceptical should keep reading…

Do I really want to deposit my best work on bioRxiv?

I’ve picked six preprints that were deposited in 2015. This selection demonstrates how important work is appearing first at bioRxiv and is being downloaded thousands of times before the papers appear in the pages of scientific journals.

  1. Accelerating scientific publishing in biology. A preprint about preprinting from Ron Vale, subsequently published in PNAS.
  2. Analysis of protein-coding genetic variation in 60,706 humans. A preprint summarising a huge effort from ExAC Exome Aggregation Consortium. 12,366 views, 4,534 downloads.
  3. TP53 copy number expansion correlates with the evolution of increased body size and an enhanced DNA damage response in elephants. This preprint was all over the news, e.g. Science.
  4. Sampling the conformational space of the catalytic subunit of human γ-secretase. CryoEM is the hottest technique in biology right now. Sjors Scheres’ group have been at the forefront of this revolution. This paper is now out in eLife.
  5. The genome of the tardigrade Hypsibius dujardini. The recent controversy over horizontal gene transfer in Tardigrades was rapidfire thanks to preprinting.
  6. CRISPR with independent transgenes is a safe and robust alternative to autonomous gene drives in basic research. This preprint concerning biosafety of CRISPR/Cas technology could be accessed immediately thanks to preprinting.

But many journals consider preprints to be previous publications!

Wrong. It is true that some journals have yet to change their policy, but the majority – including Nature, Cell and Science – are happy to consider manuscripts that have been preprinted. There are many examples of biology preprints that went on to be published in Nature (ancient genomes) and Science (hotspots in birds). If you are worried about whether the journal you want to submit your work to will allow preprinting, check this page first or the SHERPA/RoMEO resource. The journal “information to authors” page should have a statement about this, but you can always ask the Editor.

I’m going to get scooped

Preprints establish priority. It isn’t possible to be scooped if you deposit a preprint that is time-stamped showing that you were the first. The alternative is to send it to a journal where no record will exist that you submitted it if the paper is rejected, or sometimes even if they end up publishing it (see discussion here). Personally, I feel that the fear of scooping in science is overblown. In fields that are so hot that papers are coming out really fast the fear of scooping is high, everyone sees the work if its on bioRxiv or elsewhere – who was first is clear to all. Think of it this way: depositing a preprint at bioRxiv is just the same as giving a talk at a meeting. Preprints mean that there is a verifiable record available to everyone.

Preprints look ugly, I don’t want people to see my paper like that.

The depositor can format their preprint however they like! Check out Christophe Leterrier’s beautifully formatted preprint, or this one from Dennis Eckmeier. Both authors made their templates available so you can follow their example (1 and 2).

Yes but does -insert name of famous scientist- deposit preprints?

Lots of high profile scientists have already used bioRxiv. David Bartel, Ewan Birney, George Church, Ray Deshaies, Jennifer Doudna, Steve Henikoff, Rudy Jaenisch, Sophien Kamoun, Eric Karsenti, Maria Leptin, Rong Li, Andrew Murray, Pam Silver, Bruce Stillman, Leslie Vosshall and many more. Some sceptical people may find this argument compelling.

I know how publishing works now and I don’t want to disrupt the status quo

It’s paradoxical how science is all about pushing the frontiers, yet when it comes to publishing, scientists are incredibly conservative. Physics and Mathematics have been using preprinting as part of the standard route to publication for decades and so adoption by biology is nothing unusual and actually, we will simply be catching up. One vision for the future of scientific publishing is that we will deposit preprints and then journals will search out the best work from the server to highlight in their pages. The journals that will do this are called “overlay journals”. Sounds crazy? It’s already happening in Mathematics. Terry Tao, a Fields medal-winning mathematician recently deposited a solution to the Erdos discrepency problem on arXiv (he actually put them on his blog first). This was then “published” in Discrete Analysis, an overlay journal. Read about this here.

Disclaimer: other preprint services are available. F1000 Research, PeerJ Preprints and of course arXiv itself has quantitative biology section. My lab have deposited work at bioRxiv (1, 2 and 3) and I am an affiliate for the service, which means I check preprints before they go online.

Edit 14/12/15 07:13 put the scientists in alphabetical order. Added a part about scooping.

The post title comes from the term “white label” which is used for promotional vinyl copies of records ahead of their official release.


10 responses

  1. I just want to point out that while you point out that Cell journals allow preprints, their actual policy is far less clear. The exact wording is attached below:

    “Manuscripts are considered with the understanding that no part of the work has been published previously in print or electronic format and the paper is not under consideration by another publication or electronic medium. If you have questions about whether posting a manuscript or data that you plan to submit to this journal on an openly available preprint server or poster repository would affect consideration, we encourage you to contact an editor so that we may provide more specific guidance. In many cases, posting will be possible.”

    In fact, I was unable to find a preprint that has gone on to be published in Cell, and have found only the following preprints in other smaller cell-press journals (I am excluding the open-access cell reports, which actually has quite a few preprints published!):
    Cell Host and Microbe:
    1) http://biorxiv.org/content/early/2014/12/02/012070

    1) http://biorxiv.org/content/early/2015/07/10/021071
    2) http://biorxiv.org/content/early/2014/05/06/004804
    3) http://biorxiv.org/content/early/2015/03/26/017137
    4) http://biorxiv.org/content/early/2014/10/30/010751

    Its possible that I have missed some, but the end result is very suggestive that the Cell Press journals as a whole (with the notable exceptions above) are not nearly as receptive to pre-printing as your piece suggests! I think this should be a call to arms for Cell Press journal editors to reconsider their editorial policies, given the increasing success of preprinting in biology.

    I just wanted to clear that up in case anyone tries to preprint and submit to Cell and is stymied!

    1. Thanks for the comment. I have to confess that I was fully aware of all that when I wrote the post. Hence the reference to examples of preprints that were ultimately published in Nature and Science, but not Cell. I agree that any journal that even appears to be anti-preprint is soon going to find themselves on the wrong side of history. I’m sure no Cell Press title wants that.

  2. Thanks for writing this post. I fully agree with and support the idea that all science/biology papers should first appear as preprints. I hope this day comes soon.

    My own experience with the bioRxiv has been awesome, and am also a bioRxiv affiliate. I heard that at an EMBL conference (that I did not attend) one of the papers that people talked about and were most excited for was my bioRxiv preprint. My preprint has been viewed and downloaded many thousands of times from the bioRxiv before it was published in a Cell journal.

    Those of us who read preprints and feel enthusiastic for helping with their further adoption should remember that preprints are citable.

    1. Thanks for the reminder that preprints are citable. In my reading of the literature (which is only a tiny sample), I think cites to bioRxiv preprints is very low, compared to arXiv preprints. It might reflect the type of papers preprinted at each server… People that say that they are not citable are forgetting that it has been accepted for many years to cite PhD theses for example.
      Finally, I think this is an interesting point. Preprints are citable and help us to establish priority. Yet the journals require no pre-publication before we submit. This seems to be a conflict that we as a community need to solve.

      1. I think that “publication” has come to mean “formal peer review”, not simply “making something public”. In that sense the word publication is perhaps becoming a bit of a misnomer. If one accepts this misnomer, journals simply mean that a submitted paper should not have been formally peer reviewed in another journal.

        I personally consider a preprint as strong a claim to priority as a nature or a PNAS article. A preprint is a traceable, time-stamped permanant record, guaranteed by the same mechanisms that guarantee the authenticity of journal papers. In fact the record of authenticity of preprints is better, considering the recent scandal of paper swaps in elite journals leaving no traceable record.

  3. A minor problem with SHERPA/ROMEO, its not clear how accurate all their data is. I generally contact the editor of a journal directly. This almost caught us, as we were preparing to submit a paper to one of the American Physiological Society journals and even though SHERPA/ROMEO says it is acceptable to have an online preprint (http://www.sherpa.ac.uk/romeo/issn/0002-9513/) I contacted the editor. He was very clear that a preprint on bioRxiv was absolutely considered a prior publication and would preclude its consideration. I submitted an update to the SHERPA/ROMEO record about a month ago, but it remains unchanged, so if there is any doubt, contact the editor!.

    1. I’d agree with this. I’ve found errors in there too and have asked for an update.
      I see an additional problem here too in that, even if the journal is “preprint-friendly”, the person that handles your paper may not be. Journals are not bound by any duty to consider a manuscript, as we all know! In this transition phase, there is bound to be a few cases where preprinted manuscripts are turned away.

  4. […] about this progress previously (here and here). The most popular posts are those on publishing: preprints, impact factors and publication lag times, rather than my science, but that’s OK. There is […]

  5. […] is rapidly increasing, particularly in the fields of genomics and bioinformatics (see post here). SocARxiv for the social sciences is even more recent (July 2016), so we still have to wait to see […]

  6. 1 year later – I checked all (7430?) articles on biorxiv for where they are published (https://github.com/MWSchmid/crawlBiorxiv). Raw data with each article link and journal is there as well. Cell still does not have too many articles (Cell and Molecular Cell have 2 each – Cell Reports has 17). Anyway – thanks for the article.

