REF 2021: Tackling Bias in the Research Excellence Framework

Dr Julie R. Williamson, University of Glasgow 

The UK Research Excellence Framework (REF) evaluates research quality at higher education institutions in the UK, with a significant impact on the allocation of competitive funding.  With £2 billion at stake based on performance,  universities collectively spent over £210 million preparing for the previous REF in 2015.  With such a significant financial bottom-line, researchers across the UK are experiencing an aggressive top-down pressure to produce the highest ranked outputs, so called 4* papers.  While the REF exercise is generally perceived to promote research quality, there are undeniable issues with the assessment exercise that the recent independent review [3] fails miserably to capture.

From the perspective of an early career researcher in computing science, I see two key issues that warrant further reflection looking forward to REF 2021:

  • Bias in the peer review process in the context of computing science
  • Impacts of computing science culture and diversity on REF performance

Bias in Peer Review

The independent review of REF 2015 makes almost no attempt to tackle the well-documented bias that occurred within the computing science panel.  Alan Dix’s [1] detailed analysis identifies sub-topic, institution, and gender as major sources of bias.  For example, to achieve a 4* rating, papers in theoretical computing science needed to be within the top 5% internationally (based on citations). Papers in applied computing, for example human computer interaction, needed to be within the top 0.5% to achieve a 4* rating.  This also had impacts on ratings for outputs from women (who are better represented in applied areas), resulting in men being 33% more likely to achieve 4* ratings than women.  Given the history of unequal participation from men, women, and minorities in computing science, it is crucial to address these sources of bias.

Bias innate to peer review will have impacts on review panels across all REF subject areas. The word “bias” only appears once in the independent review, and then it is only within a quote from an external source.  The report uses the euphemism “distortions” to obscure any serious discussion or action on this issue.  The unwillingness of the external review to tackle bias directly is unacceptable.

Given the well documented “distorsions” in the REF computing science panel [1], it is surprising the review did not make use of research on best practices in peer review.  Only a cursory section of the report is dedicated to reflection on peer review, and mainly focuses on whether or not metrics should be included in the process.  There is no discussion on ways of improving the peer review process, for example by adopting a double-blind approach.  If scientific excellence is the primary criteria, there is strong evidence that a double-blind review is the gold standard [4].

Computing Science Culture

The culture of computing science publishing and the origins of different sub-areas visibly impacted performance in the REF review.  Many areas in computing science traditionally favour conference publication as opposed to journal publication.  Conference publication has the benefit of faster review times and greater visibility, which is especially important for early career researchers aiming to establish themselves internationally.  Given top-down advice that journal publications are intrinsically better than conference publications for REF, researchers must now decide between potentially limiting the immediate impact of their work or poor performance in REF.  While some conference publishing venues are now transitioning to a journal model, many top conference venues are still discouraged for REF submission.  

Early career researcher might be especially impacted by the REF 2021 rules [2].  For example, focusing on venues attractive for REF may limit researcher mobility internationally if UK researchers develop publication track records that don’t match what international employers value.  Although some will argue that balance can be achieved, an early career researcher is unlikely to have a large enough team to submit a very diverse set of publications early in their career.

The range of topics in computing science is also problematic, with roots in subjects as widely varied as engineering, mathematics, psychology, and design.  Although the REF independent review concludes that interdisciplinary work was not negatively biased in the review process [3], this is clearly untrue in the computing science panel.  Subjects with roots in traditional subjects like engineering and mathematics were more highly rated than subjects with interdisciplinary roots like social science and humanities.  Although some issues about the “long tail” of outputs in applied areas provides some explanation [1], it is clear that these outputs from theoretical and applied computing are not being judged equally.  The REF review reporting otherwise only further obfuscates bias in the review process.

Looking Forward

REF 2021 should be addressing the issues of bias and diverse academic cultures moving forward.  I suggest four basic changes that would promote transparency, reflection, and rigour that many academics already champion in their publishing practice:

Diversity in the Review Panel:  While research excellence should be the first criteria for participation in a review panel, diversity should have a significant influence.  The current guidance for promoting diversity is general to the point of relative uselessness.  The characteristics of diversity are not defined, so how can a panel ensure it is diverse?  I suggest the following characteristics guide the recruitment and selection of panel members; area of expertise, career stage, institution, gender, ethnicity.  Without ensuring diverse participation, implicit bias will continue in any review process implemented by the REF exercise.

Double Blind Review: Research demonstrates the value of double-blind review in mitigating bias as compared to single-blind review [4].  While not perfect, double-blind review makes a considerable improvement over single-blind reviewing.  Since panel members must already declare conflicts in a single-blind review, it should be straightforward to run REF as a double-blind review.  Single-blind reviewing depends on academic integrity to limit bias, so it shouldn’t be a stretch to depend on academic integrity to maintain a double-blind process.

Calibration Across Sub-Topics: Use external metrics such as citations to calibrate across topic areas and reflect on bias openly where is occurs.  Without clear discussion and identification of bias, the same issues will keep cropping up in each review.  Computing science is unlikely to be the only subject where sub-topics shows signs of bias, and this should have been addressed directly by the independent review of REF.

Open Data: All data about the review process and ratings to be made publically available in an anonymised form. This allows stakeholders to understand, scrutinise, and learn from the review process and communicates transparency at all levels.  As academics, we are often encouraged to do this in our own work and I firmly believe that it leads to better science.  The institutions who review us would benefit by adopting our best practices.

References

[1] Alan Dix Blog  http://alandix.com/blog/tag/ref/

[2] Let the REF Games Begin https://www.timeshighereducation.com/opinion/let-ref-games-begin

[3] Review of Research Excellence Framework https://www.gov.uk/government/publications/research-excellence-framework-review

[4] Reviewer bias in single- versus double-blind peer review  http://www.pnas.org/content/pnas/early/2017/11/13/1707323114.full.pdf

 

This blog post is part of the ACM Future of Computing’s “FCA Discussions” series. Posts in this series are intended to spur discussion and do not consist of final, formal recommendations. The FCA is a new organization and our intention is for the discussions that emerge from these posts to inform the actions we ultimately take to address the underlying issues.