{"id":1997,"date":"2013-09-06T15:07:56","date_gmt":"2013-09-06T03:07:56","guid":{"rendered":"http:\/\/sites.massey.ac.nz\/library\/?p=1997"},"modified":"2017-01-06T12:41:14","modified_gmt":"2017-01-06T00:41:14","slug":"reflections-on-the-h-index","status":"publish","type":"post","link":"https:\/\/sites.massey.ac.nz\/library\/2013\/09\/06\/reflections-on-the-h-index\/","title":{"rendered":"Reflections on the H-Index"},"content":{"rendered":"<p>You can tell that an idea has become really well-known when senior management hears about it, and a few years ago I began receiving phone calls from heads of department along the lines of \u201cI\u2019m carrying out staff evaluations and suddenly people are telling me about their h-index. What on earth is an h-index?\u201d I don\u2019t recall exactly what I replied, but it probably wasn\u2019t with the formal definition of the h-index which comes from a 2006 paper by Jorge Hirsch, <a href=\"http:\/\/arxiv.org\/pdf\/physics\/0508025.pdf\">An index to quantify an individual&#8217;s scientific research output<\/a>, which is that \u201ca scientist has index h if h of his or her Np papers have at least h citations each and the other (Np \u2013 h) papers have \u2264h citations each.\u201d The <a href=\"http:\/\/en.wikipedia.org\/wiki\/H-index\">wikipedia entry<\/a> puts it rather more elegantly &#8211; \u201ca scholar with an index of h has published h papers each of which has been cited in other papers at least h times,\u201d so that\u2019s probably more like what I said.<\/p>\n<p>At the time it might have seemed like one of those passing fads, but the h-index has stuck around and I think that\u2019s because, along with that other perennial favourite the Journal Impact Factor, it\u2019s fairly easy to understand and it does make a certain amount of sense. Unless you are totally disinclined to assign a number to a quality judgement (and there is a good case to be made for that) then this one is at least based on some solid observable data and, in its pure form, doesn\u2019t rely on any algorithms or weightings that would make it less transparent. Added to that it seems to be, within the sciences at least, a fairly robust phenomenon that can be measured at much the same level in both Scopus and Web of Science. To illustrate this, and demonstrate how to calculate an h-index, I\u2019m going to take the example of the Dutch bird migration expert Theunis Piersma. If you look him up on Scopus (using an author search) and go to his author record here\u2019s what you find &#8211;<\/p>\n<p><a href=\"http:\/\/sites.massey.ac.nz\/wp-content\/uploads\/sites\/19\/2013\/09\/hi1.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-1998\" src=\"http:\/\/sites.massey.ac.nz\/wp-content\/uploads\/sites\/19\/2013\/09\/hi1-300x133.jpg\" alt=\"hi1\" width=\"600\" height=\"266\" srcset=\"https:\/\/sites.massey.ac.nz\/library\/wp-content\/uploads\/sites\/19\/2013\/09\/hi1-300x133.jpg 300w, https:\/\/sites.massey.ac.nz\/library\/wp-content\/uploads\/sites\/19\/2013\/09\/hi1-100x44.jpg 100w, https:\/\/sites.massey.ac.nz\/library\/wp-content\/uploads\/sites\/19\/2013\/09\/hi1.jpg 477w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/a><\/p>\n<p>(Those of you fortunate enough to have a Massey login can <a href=\"http:\/\/ezproxy.massey.ac.nz\/login?url=http:\/\/www.scopus.com\/authid\/detail.url?authorId=7005941801\">click here<\/a> to work along.) What this means is that Piersma is the author of 323 articles in the database and 43 of them have been cited 43 or more times by other articles in the database. If we click on the number 323 we can actually see these records and then sort them by the number of times that they have been cited. You will notice that the top paper has received 312 citations, the second 297 and so on.<\/p>\n<p><a href=\"http:\/\/sites.massey.ac.nz\/wp-content\/uploads\/sites\/19\/2013\/09\/hi23.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-2022\" src=\"http:\/\/sites.massey.ac.nz\/wp-content\/uploads\/sites\/19\/2013\/09\/hi23-300x297.jpg\" alt=\"hi2\" width=\"300\" height=\"297\" srcset=\"https:\/\/sites.massey.ac.nz\/library\/wp-content\/uploads\/sites\/19\/2013\/09\/hi23-300x297.jpg 300w, https:\/\/sites.massey.ac.nz\/library\/wp-content\/uploads\/sites\/19\/2013\/09\/hi23-150x150.jpg 150w, https:\/\/sites.massey.ac.nz\/library\/wp-content\/uploads\/sites\/19\/2013\/09\/hi23-100x99.jpg 100w, https:\/\/sites.massey.ac.nz\/library\/wp-content\/uploads\/sites\/19\/2013\/09\/hi23.jpg 412w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p>Now if we scroll down, and turn the page a couple of times, we come to the point where the number of the left (the position on the list) is equal to the number on the right (the number of citations) &#8211; that number is the h-index. (Tada!)<\/p>\n<p><a href=\"http:\/\/sites.massey.ac.nz\/wp-content\/uploads\/sites\/19\/2013\/09\/hi3a.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-2024\" src=\"http:\/\/sites.massey.ac.nz\/wp-content\/uploads\/sites\/19\/2013\/09\/hi3a-300x54.jpg\" alt=\"hi3a\" width=\"300\" height=\"54\" srcset=\"https:\/\/sites.massey.ac.nz\/library\/wp-content\/uploads\/sites\/19\/2013\/09\/hi3a-300x54.jpg 300w, https:\/\/sites.massey.ac.nz\/library\/wp-content\/uploads\/sites\/19\/2013\/09\/hi3a-100x18.jpg 100w, https:\/\/sites.massey.ac.nz\/library\/wp-content\/uploads\/sites\/19\/2013\/09\/hi3a.jpg 521w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p><a href=\"http:\/\/sites.massey.ac.nz\/wp-content\/uploads\/sites\/19\/2013\/09\/hi3b.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-2025\" src=\"http:\/\/sites.massey.ac.nz\/wp-content\/uploads\/sites\/19\/2013\/09\/hi3b-300x127.jpg\" alt=\"hi3b\" width=\"300\" height=\"127\" srcset=\"https:\/\/sites.massey.ac.nz\/library\/wp-content\/uploads\/sites\/19\/2013\/09\/hi3b-300x127.jpg 300w, https:\/\/sites.massey.ac.nz\/library\/wp-content\/uploads\/sites\/19\/2013\/09\/hi3b-100x42.jpg 100w, https:\/\/sites.massey.ac.nz\/library\/wp-content\/uploads\/sites\/19\/2013\/09\/hi3b.jpg 330w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p>You\u2019ll notice, by the way, that this number is actually 48 rather than 43 as indicated a couple of screens back. This just means that the Scopus author records can be a little out-of-date and Piersma\u2019s h-index is probably increasing quite rapidly. I\u2019m not on a campaign to find flaws in Scopus, or at least not today, and I think it\u2019s a terrific database and the best for this particular purpose.<\/p>\n<p>Anyway, you\u2019ll recall that I said that in the sciences the h-index is a relatively robust figure, and the best way to illustrate this is by looking Piersma up in Web of Science where we find his h-index is, wait for it &#8211;<\/p>\n<p><a href=\"http:\/\/sites.massey.ac.nz\/wp-content\/uploads\/sites\/19\/2013\/09\/hi4.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-2001\" src=\"http:\/\/sites.massey.ac.nz\/wp-content\/uploads\/sites\/19\/2013\/09\/hi4-300x215.jpg\" alt=\"hi4\" width=\"300\" height=\"215\" srcset=\"https:\/\/sites.massey.ac.nz\/library\/wp-content\/uploads\/sites\/19\/2013\/09\/hi4-300x215.jpg 300w, https:\/\/sites.massey.ac.nz\/library\/wp-content\/uploads\/sites\/19\/2013\/09\/hi4-100x71.jpg 100w, https:\/\/sites.massey.ac.nz\/library\/wp-content\/uploads\/sites\/19\/2013\/09\/hi4.jpg 339w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p>(Tada Tada!) <strong>48<\/strong>. So, even though some of the details vary the nett result is the same, and it is not at all unusual for Scopus and WoS to track really closely in this way. Both databases cover all the significant journals in this area, the titles that the authors will have published in or been cited in, and this really is a \u201cbig data\u201d figure. Even if the figures were not quite this close, we can be absolutely confident that Piersma is a well-published author whose work is regularly mentioned in the work of other authors, so in this instance the h-index is an indicator of a real phenomenon, no doubt about it.<\/p>\n<p>But of course there is always doubt. Hirsch concluded his article by observing that \u201cthis index may provide a useful yardstick with which to compare, in an unbiased way, different individuals competing for the same resource when an important evaluation criterion is scientific achievement,\u201d and this is exactly where its significance lies and why it continues to be controversial. \u201cDifferent individuals competing for the same resource\u201d is another way of talking about applicants for jobs or grants, tenure or promotion, and it is now very common practice to have the Hirsch figure available to members of committees needing to compare individuals. Even if the figure isn\u2019t on the table, you can be pretty sure that someone will have quietly looked it up. However much we dislike it, the h-index is not going to go away and we need to understand both its strengths and its weaknesses.<\/p>\n<p>The first point to note is that Jorge Hirsch is a physicist and that his work was based on an examination of physicists and talked only about scientists. Scientists typically write large numbers of short journal articles (with substantial lists of references) and relatively few books. Even their conference papers are likely to end up as journal articles and it is this behaviour that generates the large amounts of data, concentrated in large databases, that can make of the h-index a relatively well-calibrated tool for use in certain disciplines. However, even within the sciences typical publishing patterns vary widely and h-indices in engineering, for example, or veterinary science will be much lower than in physics, while medical scientists will generally expect to have higher h-indices than any of these groups. Hirsch himself recognised that his index was very discipline-specific and made no claims to have created a one-size-fits-all yardstick.<\/p>\n<p>On top of that, once we get outside of the sciences the whole thing begins to fall apart, or at least to lose its apparent simplicity and transparency. Here\u2019s an example from my own field, library science. (Physicists, stop smirking!) In a 2008 article <a href=\"Testing the Calculation of a Realistic h-index\">Testing the Calculation of a Realistic h-index<\/a> P\u00e9ter J\u00e1cso of the University of Hawai\u2019i looked up the h-index of a leading information scientist F.W. Lancaster on Web of Science and got the figure 13, which struck him as being surprisingly low for an eminent and prolific scholar. The explanation turned out to be that the figure could only be generated from citations of documents published in journals indexed by Web of Science, and the citations to those documents must also come from documents published in journals indexed by Web of Science, whereas Lancaster\u2019s most highly cited works were books. Now, even when you can find citations of these books (in journals indexed by Web of Science) these do not contribute to the WoS h-index because it is generated only from WoS documents, the so called \u201cmaster documents\u201d in the lists we looked at above, although our example was from Scopus. In other words for the purposes of the h-index, WoS was a closed universe &#8211; which just happened to be almost identical to the whole universe of a physicist like Hirsch. What J\u00e1cso then did was to spend considerable time calculating Lancaster\u2019s \u201crealistic h-index\u201d which he put at 26 &#8211; and of the 26 documents that contributed to this score only 10 had \u201cmaster records\u201d in WoS and most of the others were books. What is worrying about this, then, is not just that the original figure was \u201clow\u201d but that it was, as J\u00e1cso pointed out, unrealistic and not based on Lancaster\u2019s actual work &#8211; another scholar in the field who had been much less important could conceivably have reached a comparable figure on the basis of journal articles alone. It is not good enough just to say that \u201cin some fields the figure will be lower\u201d without bearing in mind that it may also carry much less meaning in terms of its stated purpose of ranking and comparing scholars.<\/p>\n<p>We would all be fortunate if a P\u00e9ter J\u00e1cso descended from the clouds and created for us a credible \u201crealistic h-index\u201d, particularly if was double the size of the database-generated one, but that probably isn\u2019t going to happen. No one has as much interest in your score as you have, and do-it-yourself efforts to ramp it up will inevitably suffer in terms of credibility. However, to get some idea of what your \u201crealistic h-index\u201d might look like Google Scholar can be quite helpful, particularly if you have \u201cclaimed\u201d your publications. Here is <a href=\"http:\/\/scholar.google.co.nz\/citations?user=ABDPZjQAAAAJ&amp;hl=en\">Theunis Piersma on Google Scholar<\/a><\/p>\n<p><a href=\"http:\/\/sites.massey.ac.nz\/wp-content\/uploads\/sites\/19\/2013\/09\/hi51.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-2027\" src=\"http:\/\/sites.massey.ac.nz\/wp-content\/uploads\/sites\/19\/2013\/09\/hi51.jpg\" alt=\"hi5\" width=\"267\" height=\"136\" srcset=\"https:\/\/sites.massey.ac.nz\/library\/wp-content\/uploads\/sites\/19\/2013\/09\/hi51.jpg 267w, https:\/\/sites.massey.ac.nz\/library\/wp-content\/uploads\/sites\/19\/2013\/09\/hi51-100x50.jpg 100w\" sizes=\"auto, (max-width: 267px) 100vw, 267px\" \/><\/a><\/p>\n<p>When we compare it to the result we found in Scopus we find that the same highly-cited articles are still at the top but that the citation counts are higher and the h-index has risen from 48 to 63 as a consequence. Why is this? At least part of the problem is just plain unreliability, Google Scholar\u2019s inability to avoid errors and in particular duplication of items. Here is J\u00e1cso on the subject &#8211;<\/p>\n<p>\u201cThe extent and volume of inflated hit counts and citation counts cannot be fully determined, nor can they be traced and corroborated systematically, but even my chance encounters with absurd hit and citation counts, followed up by test searches for obviously implausible and often clearly nonsense hit counts and citation counts indicate the severity of the problems.\u201d<\/p>\n<p>Another issue is coverage. Google Scholar finds more citations because it looks in more places. When I looked at the works that had cited one of Piersma\u2019s articles I didn\u2019t find any duplication and the difference seemed largely to be explained by citation in theses, which seems harmless enough although it doesn\u2019t really add much to our understanding of the importance of the work &#8211; if it is cited in a lot of theses a work of this sort will already have been cited in a lot of articles. However, I know from looking at citations of one of my own work that I can probably discount about a third of them as not appearing in serious peer-reviewed documents &#8211; one of them is in a piece of student work, another is in a PowerPoint and so on. Citations can also be traced back to working papers in repositories like Social Sciences Research Network, the peer-review status of which is undetermined, and then there is the rapidly growing phenomenon of junk academic publishing. Leaving aside the issue of peer review, the fact that articles can appear in the list of references of an article without appearing in the text creates obvious distortions and the potential for manipulation. Being a search engine Google Scholar takes what it finds in those parts of the Internet that its spider covers and no value judgment is implied, whereas inclusion of an article in a database carries an understood seal of approval that carries across to those documents it has cited. (Right guys?)<\/p>\n<p>So what can we do with our big fat Google Scholar h-index? The sad truth is probably not a lot. I was at a meeting recently where a staff member of MoBIE (the Ministry of Business, Innovation, and Employment, the main distributor of state research funding in New Zealand) said that \u201c\u201dwe just laugh when someone tells us their GS h-index,\u201d and given the concerns raised above that is probably fair enough. On the other hand, if you\u2019re into thinking about bibliometrics and your Hirsch score then it\u2019s a pretty good idea to create a profile, claim your work and set up an alert so you are notified if one of your publications is cited. It may yield useful information about its impact and the fact that it has shown up in a thesis may be of interest. This implies more of a portfolio approach to your research rather than a purely metric one, and that might be the subject of a future LOL post.<\/p>\n<p>There are a number of other problems with the h-index that Hirsch touched upon in 2005, but that probably merit highlighting. As already noted, some fields do much better than others simply because of the number of people working in them &#8211; the more potential citers then the more citations there are likely to be. This is well-recognised, and the so-called Crown Indicator seeks to create norms for individual disciplines, but it could potentially work within fields of research as well, and work that is highly specialised and even groundbreaking could attract less measurable attention than more mainstream activity. Another thing about the h-index is that it measures and rewards a certain type of behaviour, namely the practice of publishing a large number of separate documents. That is pretty standard practice in the sciences and undoubtedly those with significant scores have got them by doing some pretty significant work &#8211; it would be difficult to reach a score of 43 without getting out of bed fairly early each day &#8211; but as Hirsch observed \u201cthe converse is not necessarily always true &#8230; for an author with relatively low h that has a few seminal papers with extraordinarily high citation counts, the h-index will not fully reflect that scientist\u2019s accomplishments.\u201d The <a href=\"http:\/\/en.wikipedia.org\/wiki\/G-index\">g-index<\/a> is another measure that attempts to account for this by taking into account large numbers of citations of individual papers, but the truth is that sometimes real importance or value may simply not be assessable by the use of a metric.<\/p>\n<p>The final caveat that Hirsch raised was the question of authorship, one which bibliometrics as it currently stands has no simple answer to. As it stands anyone who is named in the list of authors gets the full value of the citations no matter what their role, so that each one of the nearly 3,000 authors of one 2008 paper on the Hadron Collider gets the full value of the 563 citations it has so far received. Actually as a measure the h-index is relatively well equipped to deal with this particular case. If any of the authors has only this one paper to their credit then they will have a citations per article score of 563 (wow!) but their h-index will still only be 1. (Think this through &#8211; if you get it then you understand the h-index.) On the authorship question as a whole the Hirsch score is probably reasonably robust at the higher end &#8211; nobody gets to be co-author of a large number of well-cited papers without being a smart cookie and a good contributor &#8211; but one can imagine the case where a statistician attached to a productive research institute could rack up a pretty good number by being a competent Johnny-on-the-spot.<\/p>\n<p>The other thing the h-index doesn\u2019t, of itself, take into account is time. It is sometimes called a \u201clifetime achievement measure\u201d and, while you don\u2019t have to wait forever to reach a respectable number, it does not accommodate early stage researchers, some of whom tend to dislike it immoderately and reach for their altmetric scores instead. I guess that looking at the scores of older colleagues might be a bit like knowing exactly how high the mountain you are climbing is, and in the first few years when there are only a few publications with which to hook citations progress probably seems unbearably slow. In similar vein, it doesn\u2019t measure current or recent activity &#8211; unlike us, the h-index never goes into decline and can continue long after research activity has slowed down or ceased. It is possible, and important if the figure is being used for promotion or grant applications, to take a reading of the h-index of an author\u2019s publications over the past eight to ten years to determine their current status, although there is a risk that if this is shortened to, say, the last five years then the figure may be skewed in favour of those who published more work five years ago as opposed to those who published in the past two or three years &#8211; citations, like cheese, take time to mature.<\/p>\n<p>So, those are just a few thoughts on the Hirsch Index. Until something else comes along that is not only more robust but also as easy to understand then it will be with us, just like the Impact Factor. The point about numbers in a case like this is that they are indicators, not things, and any sort of metric requires closer examination to extend it into a valid depiction of reality. If it is used to say \u201cDr X has a track record of publishing papers that are well cited, so if we give her this grant\/job\/leave then there is a good chance he will continue to do the same\u201d then that is fair enough. On the other hand, though, if someone does not have this sort of track record because of their age, or from being new to scholarship, then this is not the place to look for evidence. I apologise for the length of this post, if anyone is still reading, but felt that there was no obvious point to cut it in two and that it was important to convey, to the best of my ability, the whole picture. If I\u2019ve got any of it badly wrong please let me know, nicely if possible, and for those of you at Massey I\u2019m always happy to answer questions and share ideas, as are my other colleagues in the Library.<\/p>\n<p>Bruce White<br \/>\neResearch Librarian<br \/>\n<a href=\"http:\/\/sites.massey.ac.nz\/library\/category\/eresearch-2\/\">eResearch on Library Out Loud<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>You can tell that an idea has become really well-known when senior management hears about it, and a few years ago I began receiving phone calls from heads of department along the lines of \u201cI\u2019m carrying out staff evaluations and suddenly people are telling me about their h-index. What on earth is an h-index?\u201d I [&hellip;]<\/p>\n","protected":false},"author":32,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[140],"class_list":["post-1997","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-bibliometrics"],"_links":{"self":[{"href":"https:\/\/sites.massey.ac.nz\/library\/wp-json\/wp\/v2\/posts\/1997","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sites.massey.ac.nz\/library\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sites.massey.ac.nz\/library\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sites.massey.ac.nz\/library\/wp-json\/wp\/v2\/users\/32"}],"replies":[{"embeddable":true,"href":"https:\/\/sites.massey.ac.nz\/library\/wp-json\/wp\/v2\/comments?post=1997"}],"version-history":[{"count":25,"href":"https:\/\/sites.massey.ac.nz\/library\/wp-json\/wp\/v2\/posts\/1997\/revisions"}],"predecessor-version":[{"id":4556,"href":"https:\/\/sites.massey.ac.nz\/library\/wp-json\/wp\/v2\/posts\/1997\/revisions\/4556"}],"wp:attachment":[{"href":"https:\/\/sites.massey.ac.nz\/library\/wp-json\/wp\/v2\/media?parent=1997"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sites.massey.ac.nz\/library\/wp-json\/wp\/v2\/categories?post=1997"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sites.massey.ac.nz\/library\/wp-json\/wp\/v2\/tags?post=1997"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}