When it comes to measuring the impact of our science, citations are pretty much all we have. And not only that but they only say one thing – yeah – with no context. How can we enrich citation data?
Much has been written about how and why and whether or not we should use metrics for research assessment. If we accept that metrics are here to stay in research assessment (of journals, Universities, departments and of individuals), I think we should be figuring out better ways to look at the available information.
Citations to published articles are the key metric under discussion. This is because they are linked to research outputs (papers), have some relation to “impact” and they can be easily computed and a number of metrics have been developed to draw out information from the data (H-index, IF etc.). However there are many known problems with citations such as: they are heavily influenced by the size of the field. What I want to highlight here is what a data-poor resource they are and think of ways we could enrich the dataset with minimal modification to our existing databases.
1. We need a way to distinguish a yeah from a no
The biggest weakness of using citations as a measure of research impact is that a citation is a citation. It just says +1. We have no idea if +1 means “the paper stinks” or “the work is amazing!”. It’s incredible that we can rate shoelaces on Amazon or eBay but we haven’t figured out a way to do this for scientific papers. Here’s a suggestion:
- A neutral citation is +1
- A positive citation is +2
- A negative citation is -1
A neutral citation would be stating a fact and adding reference to support it, e.g. DNA is a double helix (Watson & Crick, 1953).
A positive citation would be something like: in agreement with Bloggs et al. (2010), we also find x.
A negative citation might be: we have tested the model proposed by Smith & Jones (1977) and find that it does not hold.
One further idea (described here) is to add more context to citation using keywords. Such as “replicating”, “using”, “consistent with”. This would also help with searching the scientific literature.
2. Multiple citations in one article
Because currently, citations are a +1, there is no way to distinguish whether the paper giving the citation was mentioning the cited paper in passing or was entirely focussed on that one paper.
Another way to think about this is that there are multiple reasons to cite a paper: maybe the method or reagent is being used, maybe they are talking about Figure 2 showing X or Figure 5 showing Y. What if a paper is talking about all of these things? In other words, the paper was very useful. Shouldn’t we record that interest?
Suggestion: A simple way to do this is to count the number of mentions in the text of the paper rather than just if the paper appears in the reference list.
3. Division of a citation unit for fair credit to each author
Calculations such as the H-index make no allowance for the position of the author in the author list (used in biological sciences and some other fields to denote contribution to the paper). It doesn’t make sense that the 25th author on a 50 author paper receives 100% of the citation credit as the first or last author. Similarly, the first author on a two author paper is only credited in the same way as the middle author on a multi-author paper. The difference in contribution is clear, but the citation credit is not. This is because the citation credit for the former paper is worth 25 times that of the latter! This needs to be equalised. The citation unit, c could be divided to achieve fair credit for authors. At the moment, c=1, but could be multiples (or negative values) as described above. Here’s a suggestion:
- First (and multiple first) and last (and co-last) authors get 0.5c divided by number of authors.
- The remainder, 0.5c, is divided between all authors.
For a two author paper: first author gets 0.5c and last author gets 0.5c. (0.5c/2+0.5c/2)=0.5c
For a ten author paper with one first author and one last author, first and last author each get (0.5c/2+0.5c/10)=0.3c and the 5th author gets (0c+0.5c/10)=0.05c.
Note that the sum for all authors will equal c. So this is equalised for all papers. These citation credits would then be the basis for H-index and other calculations for individuals.
Most simply, the denominator would be the number of authors, or – if we can figure out a numerical credit system – each author could be weighted according to their contribution.
4. Citations to reviews should be downgraded
A citation to a review is not equal to a citation to a research paper. For several reasons. First, they are cited at a higher rate, because they are a handy catchall citation particularly for the Introduction section in papers. This isn’t fair either and robs credit from the people who did the work that actually demonstrated what is being discussed. Second, the achievement of publishing a review is nothing in comparison to publishing a paper. Publishing a review involves 1) being asked, 2) writing it, 3) light peer review and some editing and that’s it! Publishing a research paper involves much more effort: having the idea, getting the money, hiring the people, training the people, getting a result – and we are only at the first panel in Fig 1A. Not to mention the people-hours and arduous peer review process. It’s not fair that citations to reviews are treated as equal to papers when it comes to research assessment.
Suggestion: a citation to a review should be worth a fraction (maybe 1/10th) of a citation to a research paper.
In addition, there are too many reviews written at the moment. I think this is not because they are particularly useful. Very few actually contribute a new view or new synthesis of an area, most are just a summary of the area. Journals like them because they drive up their citation metrics. Authors like them because it is nice to be invited to write something – it means people are interested in what you have to say… If citations to reviews were downgraded, there would be less incentive to publish them and we would have more space for all those real papers that are getting rejected at journals that claim that space is a limitation for publication.
5. Self-citations should be eliminated
If we are going to do all of the above, then self-citation would pretty soon become a problem. Excessive self-citation would be difficult to police, and not many scientists would go for a -1 citation to their own work. So, the simplest thing to do is to eliminate self-citation. Author identification is crucial here. At the moment this doesn’t work well. In ISI and Scopus, whatever algorithm they use keeps missing some papers of mine (and my name is not very common at all). I know people who have been grouped with other people that they have published one or two papers with. For authors with ambiguous names, this is a real problem. ORCID is a good solution and maybe having an ORCID (or similar) should be a requirement for publication in the future.
Suggestion: the company or body that collates citation information needs to accurately assign authors and make sure that research papers are properly segregated from reviews and other publication types.
These were five things I thought of to enrich citation data to improve research assessment, do you have any other ideas?
The post title is taken from ‘”Yeah” Is What We Had’ by Grandaddy from their album Sumday.