A primer on classical test theory and item response theory for assessments in medical education

André F De Champlain

doi:10.1111/j.1365-2923.2009.03425.x

A primer on classical test theory and item response theory for assessments in medical education

Med Educ. 2010 Jan;44(1):109-17. doi: 10.1111/j.1365-2923.2009.03425.x.

Author

André F De Champlain¹

Affiliation

¹ National Board of Medical Examiners, Philadelphia, Pennsylvania 19104, USA. adechamplain@nbme.org

PMID: 20078762
DOI: 10.1111/j.1365-2923.2009.03425.x

Abstract

Context: A test score is a number which purportedly reflects a candidate's proficiency in some clearly defined knowledge or skill domain. A test theory model is necessary to help us better understand the relationship that exists between the observed (or actual) score on an examination and the underlying proficiency in the domain, which is generally unobserved. Common test theory models include classical test theory (CTT) and item response theory (IRT). The widespread use of IRT models over the past several decades attests to their importance in the development and analysis of assessments in medical education. Item response theory models are used for a host of purposes, including item analysis, test form assembly and equating. Although helpful in many circumstances, IRT models make fairly strong assumptions and are mathematically much more complex than CTT models. Consequently, there are instances in which it might be more appropriate to use CTT, especially when common assumptions of IRT cannot be readily met, or in more local settings, such as those that may characterise many medical school examinations.

Objectives: The objective of this paper is to provide an overview of both CTT and IRT to the practitioner involved in the development and scoring of medical education assessments.

Methods: The tenets of CCT and IRT are initially described. Then, main uses of both models in test development and psychometric activities are illustrated via several practical examples. Finally, general recommendations pertaining to the use of each model in practice are outlined.

Discussion: Classical test theory and IRT are widely used to address measurement-related issues that arise from commonly used assessments in medical education, including multiple-choice examinations, objective structured clinical examinations, ward ratings and workplace evaluations. The present paper provides an introduction to these models and how they can be applied to answer common assessment questions.

Publication types

Review

MeSH terms

Computer-Assisted Instruction / methods
Education, Medical / methods*
Educational Measurement / methods*
Humans
Models, Educational*
Models, Statistical
Psychometrics