There are two main reasons employers conduct validation studies on pre-employment tests: 1) To assess whether the test is an effective tool for the desired situation (i.e., for the target job or group of jobs, how the test should be used in conjunction with other tests, what cutoffs may be appropriate, etc.), and 2) For defensibility--for both low-stakes (challenges brought by individual applicants) and high-stakes (challenges under Title VII and government audits) situations.
Choices for validation strategies include both local techniques that strive to evaluate and/or evidence connections between the test and the target position and more global strategies that evaluate how tests seem to generalize across various settings and positions (as with Validity Generalization, or "VG"). Both federal (the Uniform Guidelines (1)) and professional (the Joint Standards (2) and SIOP Principles (3)) standards permit techniques that investigate the local validity of the test as well as the more global techniques (e.g., through transportability studies that attempt to connect existing validity evidence with the local situation).
With such choices available to practitioners, which techniques are likely to produce the most accurate and defensible results; the local validity techniques or the more global ones? What have the courts had to say about either technique in Title VII situations where an employer is being required to demonstrate validity to justify their testing practices? Answers to these questions and others will be provided in this review. While a number of test validation techniques are available to practitioners (e.g., content validity, construct validity, etc.), this discussion will be limited to only two: local criterion-related validity and VG. (4)
Overview of Local Criterion-Related Validity
A local criterion-related validity study is conducted by statistically correlating test scores with some measure of job performance (typically supervisor ratings or performance evaluation scores). Following the conventional practices for the social sciences, validity can be claimed if the correlation between test scores and some job performance metric (i.e., the criterion) has a corresponding probability value that is less than .05, which indicates that the correlation is a "beyond chance" occurrence. This type of validity study is typically conducted for tests that measure abstract traits (e.g., some types of cognitive ability, personality, etc.) that may not have obvious connections to the job (as contrasted with content validity, which seeks to demonstrate a more rational-type of connection between the test and the job with traits that are more concrete in nature).
The steps necessary to conduct this type of validation study are very straightforward. Under a predictive model, the researcher administers the test to the applicants and then correlates test scores with some subsequent measure of job performance. Under a concurrent model, the test is given to current job incumbents and simultaneously correlated with job performance metrics of some type. Under either model, having high reliability for both the test and job performance metrics is key for making sure that the results will be accurate and reliable.
Having an adequate sample size to maximize statistical power is also important when conducting a local study. Statistical power refers to the ability of the study to find a statistically significant finding if it exists in the target population. Validity studies that have large sample sizes (e.g., 300+ subjects) have high statistical power, and those with small samples have low statistical power. For example, assume that a researcher wanted to find out if a certain test had a validity coefficient of .25 or higher, and there were only 80 incumbents in the target position for whom test and job performance data was available. In this situation, they could be about 73% confident (i. …