Educational Testing Validity and Reliability in Pharmacy and Medical Education Literature
Hoover, Matthew J., Jung, Rose, Jacobs, David M., Peeters, Michael J., American Journal of Pharmaceutical Education
Pharmacy educators use a wide variety of evaluation methods to ascertain whether students achieved specific learning objectives. When developing and evaluating the effectiveness of a doctor of pharmacy (PharmD) curriculum, educators must consider the standards for validity and reliability of educational testing. (1) Standardized tests, such as the Pharmacy College Admission Test and North American Pharmacist Licensure Examination, are used as bookends to assess students' pharmacy-related knowledge and infer competence for licensure. (2,3) Predictive evidence exists for these student performances. (4-7) Educational testing throughout a PharmD program should provide valid and reliable assessment of students' abilities.
When reporting evaluation methods used in the educational research of health professions, it is essential to consider evidence for validity and reliability. The authors were not aware of any literature reviews assessing the extent of validity and reliability reporting associated with evaluation methods in the pharmacy education literature. The objectives for this study were to characterize reliability and validity with educational testing reported in pharmacy education journals, and compare these with medical education literature reporting.
We evaluated validity and reliability reporting in articles that focused on educational testing of learner knowledge, skills, or abilities. To describe levels of reliability and validity reporting associated with pharmacy education literature, articles published in pharmacy education journals were reviewed and the findings were compared to medical education articles. Journals reviewed within pharmacy education were American Journal of Pharmaceutical Education (AJPE), Currents in Pharmacy Teaching and Learning, Pharmacy Education, Annals of Pharmacotherapy, and American Journal of Health-System Pharmacy. Journals reviewed within medical education were Medical Education, Academic Medicine, Medical Teacher, Teaching and Learning in Medicine, and Journal of Graduate Medical Education. Using purposive sampling, we included these journals because they were deemed most likely to include a good cross-section of educational testing.
Within each journal, the table of contents was reviewed for each issue from 2009 to 2012, and 2 reviewers independently identified articles that used educational testing. If an abstract suggested use of educational testing, the reviewer examined the article's full text to ultimately determine eligibility. Examples of educational testing methods included multiple-choice questions, true-false questions, long-answer case notes, and performance-based assessments such as objective structured clinical examinations or clerkship outcomes assessments. Educational testing methods could have been present in the form of examinations, other course work, or even as periodic assessments outside of coursework such as end-of-year or preclinical practice experience examinations. Among the included studies in this analysis, participants were pharmacy and medical learners (students or residents). We included all articles published in the pharmacy education or medical education journals listed above between January 2009 and December 2011; all study designs and countries of origin were reviewed. Articles were excluded if only learner attitudes or opinions were assessed.
Reviewers independently extracted reliability and validity evidence dichotomously (yes/no) for each included study. We used the same definitions for sources of reliability and validity evidence that were used in a prior medical education review. (8) Evidence sources for reliability included test-retest reliability such as reporting a correlation coefficient of scores from tests taken twice over a period of time by learners, a coefficient for internal consistency such as the Cronbach alpha, and inter-rater reliability such as intraclass correlation. …