bibliometrics - Should self-citations be excluded when calculating the h-index?

Saturday, 6 April 2019

bibliometrics - Should self-citations be excluded when calculating the h-index?

The Hirsch index, or h-index is a widely used citation statistic that, arguably, accurately reflects the impact of a scientist. It takes into account the number of publications as well as how often those papers are cited. For example, an author with 4 publications each with at least 4 citations, has a h-index of 4. Another author with 200 publications, each cited only once, has a Hirsch index of just 1, simply because the papers are not cited more than once. A possible confounding factor in this index are self-citations. If the latter fictional author would have cited all his previous work in his latter 100 papers, their h-index would sky rocket to 100.

Google Scholar nicely provides the h-index and at my institution they use Google Scholar to calculate the h-index for every researcher. However, Google Scholar includes self-citations, while I have heard colleagues from other institutions say that a h-index should not include self-citations, for reasons illustrated above.

Interestingly, the widely used journal impact factor (JIF) from Thomson Reuters does include self-citations (Shema, 2012).

My question is: should the h-index include or exclude self-citations? Is there a consensus reached on this topic? If there is no consensus, should the h-index then not be accompanied by an identifier to clarify which of the two methods was adopted to calculate it?

_{Reference
- Shema, Sci Am blogs, 2012}

Answer

There's no firm consensus on whether to include self-citations. (For example, the original paper by Hirsch discusses how one could correct for self-citations but doesn't include this as part of the definition of the h-index.) The reason is that it doesn't matter: the h-index is a crude tool, and if your decisions make delicate enough use of it that the outcome may change depending on whether self-citations are included, then you are using it wrong.

For example, you mention a hypothetical case of someone whose high h-index comes primarily from self-citations. In a case like this, someone on the hiring/tenure committee should ask "Gee, why does this candidate have such a high h-index when the rest of the file gives little or no evidence that their work is influential or important?" Then a few minutes of investigation will reveal the truth.

There's nothing special about self-citations here. I know a case of an eccentric researcher in mathematics who gets a lot of citations from followers of his publishing in marginal places. The total number looks impressive, but if you look at where the citations are coming from, you find only rather weird-looking papers published in places you've never heard of. To keep from being misled by cases like this, you have to do some due diligence when you see a surprising number, and if you're doing that already then skewed h-indices from self-citation are not such a great threat. (In practice the skew is generally pretty small, too.)

The net effect is that if the hiring or tenure committee is just paying attention to numbers like the h-index, without any perspective or further investigation, then that's a major problem with their methodology. If they do notice oddities but feel compelled to give credit for a high h-index anyway, then that's an even worse problem.

In practice, different websites for computing h-indices can give substantially different values, depending on which sources they count citations from. If you care about specifying a well-defined number, then you need to tell exactly how the h-index was computed (which goes far beyond just whether self-citations are included).

Blog

Saturday, 6 April 2019

bibliometrics - Should self-citations be excluded when calculating the h-index?

No comments:

Post a Comment

evolution - Are there any multicellular forms of life which exist without consuming other forms of life in some manner?