Investigating possible ethnicity and sex bias in clinical examiners: an analysis of data from the MRCP(UK) PACES and nPACES examinations

I C McManus; Andrew T Elder; Jane Dacre

doi:10.1186/1472-6920-13-103

Investigating possible ethnicity and sex bias in clinical examiners: an analysis of data from the MRCP(UK) PACES and nPACES examinations

BMC Med Educ. 2013 Jul 30:13:103. doi: 10.1186/1472-6920-13-103.

Authors

I C McManus¹, Andrew T Elder, Jane Dacre

Affiliation

¹ Academic Centre for Medical Education, Division of Medical Education, University College London, Gower Street, London WC1E 6BT, UK. i.mcmanus@ucl.ac.uk

Abstract

Background: Bias of clinical examiners against some types of candidate, based on characteristics such as sex or ethnicity, would represent a threat to the validity of an examination, since sex or ethnicity are 'construct-irrelevant' characteristics. In this paper we report a novel method for assessing sex and ethnic bias in over 2000 examiners who had taken part in the PACES and nPACES (new PACES) examinations of the MRCP(UK).

Method: PACES and nPACES are clinical skills examinations that have two examiners at each station who mark candidates independently. Differences between examiners cannot be due to differences in performance of a candidate because that is the same for the two examiners, and hence may result from bias or unreliability on the part of the examiners. By comparing each examiner against a 'basket' of all of their co-examiners, it is possible to identify examiners whose behaviour is anomalous. The method assessed hawkishness-doveishness, sex bias, ethnic bias and, as a control condition to assess the statistical method, 'even-number bias' (i.e. treating candidates with odd and even exam numbers differently). Significance levels were Bonferroni corrected because of the large number of examiners being considered.

Results: The results of 26 diets of PACES and six diets of nPACES were examined statistically to assess the extent of hawkishness, as well as sex bias and ethnicity bias in individual examiners. The control (odd-number) condition suggested that about 5% of examiners were significant at an (uncorrected) 5% level, and that the method therefore worked as expected. As in a previous study (BMC Medical Education, 2006, 6:42), some examiners were hawkish or doveish relative to their peers. No examiners showed significant sex bias, and only a single examiner showed evidence consistent with ethnic bias. A re-analysis of the data considering only one examiner per station, as would be the case for many clinical examinations, showed that analysis with a single examiner runs a serious risk of false positive identifications probably due to differences in case-mix and content-specificity.

Conclusions: In examinations where there are two independent examiners at a station, our method can assess the extent of bias against candidates with particular characteristics. The method would be far less sensitive in examinations with only a single examiner per station as examiner variance would be confounded with candidate performance variance. The method however works well when there is more than one examiner at a station and in the case of the current MRCP(UK) clinical examination, nPACES, found possible sex bias in no examiners and possible ethnic bias in only one.

MeSH terms

Certification / methods
Certification / standards
Clinical Competence / standards*
Educational Measurement / standards*
Ethnicity
Female
Humans
Male
Observer Variation
Racism*
Sexism*
United Kingdom