Cited page

Citations are available only to our active members. Sign up now to cite pages or passages in MLA, APA and Chicago citation styles.

X X

Cited page

Display options
Reset

Should Employers Rely on Local Validation Studies or Validity Generalization (VG) to Support the Use of Employment Tests in Title VII Situations?

By: Biddle, Daniel A. | Public Personnel Management, Winter 2010 | Article details

Look up
Saved work (0)

matching results for page

Why can't I print more than one page at a time?
While we understand printed pages are helpful to our users, this limitation is necessary to help protect our publishers' copyrighted material and prevent its unlawful distribution. We are sorry for any inconvenience.

Should Employers Rely on Local Validation Studies or Validity Generalization (VG) to Support the Use of Employment Tests in Title VII Situations?


Biddle, Daniel A., Public Personnel Management


There are two main reasons employers conduct validation studies on pre-employment tests: 1) To assess whether the test is an effective tool for the desired situation (i.e., for the target job or group of jobs, how the test should be used in conjunction with other tests, what cutoffs may be appropriate, etc.), and 2) For defensibility--for both low-stakes (challenges brought by individual applicants) and high-stakes (challenges under Title VII and government audits) situations.

Choices for validation strategies include both local techniques that strive to evaluate and/or evidence connections between the test and the target position and more global strategies that evaluate how tests seem to generalize across various settings and positions (as with Validity Generalization, or "VG"). Both federal (the Uniform Guidelines (1)) and professional (the Joint Standards (2) and SIOP Principles (3)) standards permit techniques that investigate the local validity of the test as well as the more global techniques (e.g., through transportability studies that attempt to connect existing validity evidence with the local situation).

With such choices available to practitioners, which techniques are likely to produce the most accurate and defensible results; the local validity techniques or the more global ones? What have the courts had to say about either technique in Title VII situations where an employer is being required to demonstrate validity to justify their testing practices? Answers to these questions and others will be provided in this review. While a number of test validation techniques are available to practitioners (e.g., content validity, construct validity, etc.), this discussion will be limited to only two: local criterion-related validity and VG. (4)

Overview of Local Criterion-Related Validity

A local criterion-related validity study is conducted by statistically correlating test scores with some measure of job performance (typically supervisor ratings or performance evaluation scores). Following the conventional practices for the social sciences, validity can be claimed if the correlation between test scores and some job performance metric (i.e., the criterion) has a corresponding probability value that is less than .05, which indicates that the correlation is a "beyond chance" occurrence. This type of validity study is typically conducted for tests that measure abstract traits (e.g., some types of cognitive ability, personality, etc.) that may not have obvious connections to the job (as contrasted with content validity, which seeks to demonstrate a more rational-type of connection between the test and the job with traits that are more concrete in nature).

The steps necessary to conduct this type of validation study are very straightforward. Under a predictive model, the researcher administers the test to the applicants and then correlates test scores with some subsequent measure of job performance. Under a concurrent model, the test is given to current job incumbents and simultaneously correlated with job performance metrics of some type. Under either model, having high reliability for both the test and job performance metrics is key for making sure that the results will be accurate and reliable.

Having an adequate sample size to maximize statistical power is also important when conducting a local study. Statistical power refers to the ability of the study to find a statistically significant finding if it exists in the target population. Validity studies that have large sample sizes (e.g., 300+ subjects) have high statistical power, and those with small samples have low statistical power. For example, assume that a researcher wanted to find out if a certain test had a validity coefficient of .25 or higher, and there were only 80 incumbents in the target position for whom test and job performance data was available. In this situation, they could be about 73% confident (i.e., have 73% power) of finding such a coefficient (if it existed to be found). With twice the sample size (160 subjects), power is increased to about 94%, which provides the researcher an almost certain ability to find out whether the test was valid for the target position.

Overview of Validity Generalization

VG studies rely on a research technique called meta-analysis. Meta-analysis seeks to combine the results of several similar research studies to form general theories about relationships between similar variables across different situations. As early as 1977, Schmidt & Hunter (5) applied meta-analyses techniques to the field of personnel testing and framed it as VG. Prior to this time, meta-analyses in the personnel testing and psychological literature was very rare, (6) but it has since grown to widespread use in the academic field.

The purpose for conducting VG studies in the personnel field is to evaluate the effectiveness (i.e., validity) of a particular type of personnel test (e.g., personality, integrity, conscientiousness) and to describe what the findings mean in a broader sense. (7) Practically speaking, VG studies are conducted by compiling several related local criterion-related validity studies into an aggregate analysis to determine the overall effectiveness of the test(s) included in the study for the jobs and settings involved. VG studies also make use of various statistical corrections (e.g., sampling error, range restriction, and criterion unreliability) designed to the researcher to forecast what the overall operational validity of the test(s) may, in fact, be if they were not hampered by these suppressors.

Some researchers that conduct VG studies apply the "75 Percent Rule" to determine whether validity can be generalized outside of the VG study to other situations. The 75 Percent Rule evaluates whether at least 75 percent of variance in the observed validities (in the VG study) are said to be accounted for by the correctable statistical artifacts (i.e., sampling error, criterion unreliability, predictor unreliability, and range restriction on the predictor), then the variance between validities is assumed to be zero because the uncorrected artifacts would likely account for the remaining 25 percent of variance. VG studies where at least 75 percent of the variance is explained by these correctable artifacts are said to generalize to other settings outside those included in the study.

Another more contemporary tool used in VG research is the credibility interval which is used by some researchers to determine the extent to which validity can be generalized outside the VG study. The credibility interval is an estimate of the variability of individual correlations across studies and informs the researcher the percentage of correlations in the study that are "not likely to be zero." For example, an 80% credibility interval indicates that 90% of the individual correlations in the VG study excluded zero. (8) One of the major limitations of "corrected" VG studies (as will be discussed more in depth below) is that there is no guarantee that employers would find the level of validity promised by the result of a VG study if a study was performed in a new local setting. This is primarily because a host of situational factors exist in each and every new situation that may drastically impact the validity of a test. In addition, there are a number of limitations with typical VG studies that may further limit their relevance and reliability when evaluating test validity in new situations (see discussion below). However, VG studies offer useful insights into the strength of the relationship between the test and job performance in the studies included in the VG analysis and can be immensely useful in personnel research studies.

Federal and Professional Requirements Surrounding Validity Generalization

Because there is a high degree of overlap and agreement between the Uniform Guidelines and the professional standards regarding the basics involved in conducting and interpreting local criterion-related validity studies, they will not be reviewed here. Only the federal and professional standards relevant to VG are covered because the more recent version of the SIOP Principles (2003) provided additional content surrounding this topic than previous standards included.

Validity Generalization and the Uniform Guidelines

The Uniform Guidelines include two primary sections that describe the requirements for transporting validity evidence from either a VG study or a single validity study conducted elsewhere. Section 7B describes the requirements for transporting validity evidence from one (or more) studies to a new local situation, and requires that a job comparability study is conducted between both locations and that the original study includes a fairness study. Sections 7C and 7D direct specific attention to "variables that are likely to affect validity significantly" (called "moderators" in the context of VG studies) and, if such variables exist, the user may not rely on the studies, but will be expected instead to conduct an internal validity study in their local situation.

Section 15E of the Uniform Guidelines provides additional guidance regarding transporting validity evidence from existing studies into new situations. Section 15El(b) includes elements that pertain to the utility and effectiveness of the test and the mitigation of risk that is gained by using a test supported by local validity evidence. Section 15El(c) cautions the researcher to ensure that extraneous variables are not operating in a way that negatively impacts test validity. Finally, Section 15El(d) suggests evaluating

The rest of this article is only available to active members of Questia

Sign up now for a free, 1-day trial and receive full access to:

  • Questia's entire collection
  • Automatic bibliography creation
  • More helpful research tools like notes, citations, and highlights
  • Ad-free environment

Already a member? Log in now.

Select text to:

Select text to:

  • Highlight
  • Cite a passage
  • Look up a word
Learn more Close
Loading One moment ...
Highlight
Select color
Change color
Delete highlight
Cite this passage
Cite this highlight
View citation

Are you sure you want to delete this highlight?