Well-constructed objective structured clinical examinations (OSCE) are a reliable and valid method of assessing health professional students' clinical and communication skills. (1-3) OSCEs consist of a series of stations that prompt students to perform specified tasks within a defined amount of time. (1-3) Student performance is evaluated most often using a binary checklist and a global impression scale at the conclusion of each station by the standardized patient who participated in the encounter or a faculty member who observed the encounter. (1-3) The accuracy of an assessment method is related, in part, to its reliability, an indicator of its consistency in producing the same or similar results when used in the same or similar circumstances. (2-3)
At our institution, prior to this study, the standardized patient completed the evaluation tools that determined whether a student passed or failed an OSCE station. However, taking into account higher-stakes OSCEs, we considered having pharmacy faculty members evaluate student performance instead. A potentially limiting factor when using faculty evaluators for an OSCE is the number of faculty members required during examination administration. (1,3,4) To increase flexibility, overcome scheduling barriers, and reduce the number of faculty members who must be present during an OSCE, some schools and colleges have elected to have faculty members review and evaluate student performance at a later time, using a video recording of the encounter. Given the resources dedicated to the development and implementation of OSCEs, ensuring the reliability of the examination is important, regardless of when the assessment of student performance occurs.
To date, only 1 study examining the reliability of OSCEs has compared real-time and video recorded observations. (5) Vivekananda-Schmidt and colleagues investigated inter-rater reliability between real-time and video-recorded OSCEs of 95 third-year medical students' shoulder and knee examinations using the intraclass correlation coefficient (ICC). (5) Real-time OSCE encounters were scored by physicians training in rheumatology who were present in the room with the student and standardized patient during the encounter. Later, a consultant rheumatologist observed a video recording of the encounter and independently scored each student's performance. No specific training was provided for the examiners, although the real-time examiners had previous experience administering and scoring this type of OSCE. Good inter-rater reliability was observed between real-time and video-recorded assessments on a binary checklist for the shoulder examination ([ICC.sub.2,4] = 0.55; 95% CI = 0.22 - 0.72) and the knee examination ([ICC.sub.2,1] = 0.58; 95% CI = 0.34 - 0.75). However, poor inter-rater reliability was observed on the global impression scale for the shoulder examination ([ICC.sub.2,1] = 0.36; 95% CI = -0.10 - 0.69) and knee examination ([ICC.sub.2,1] = 0.32; 95% CI = -0.05 - 0.61). This study suggested that scoring OSCE stations using video recordings instead of real-time observations may not be equivalent. However, observed differences in reliability may have been attributable to the lack of rater training, differences in examiner expertise, differences in observations (direct observation vs. video monitor), or a combination of these factors.
To determine whether the time and method of observation (eg, real time or video-recorded) impacts the reliability of OSCE scores, evaluating intra-rater reliability, not inter-rater reliability, is important; but to our knowledge, no reliability studies have been published on this subject. The objective of this study was to estimate the intra-rater reliability of faculty evaluations of student OSCE performance in real-time and video-taped observations. Our hypothesis was that the 2 observation methods would be similar enough that faculty members could use either method interchangeably during an OSCE. …