The discussion in this appendix draws heavily upon Babbie (1995:121-9).
Validity raises the question of whether our data actually measure what we claim they do. Concepts themselves do not exist in the real world, or have "real definitions" (Babbie 1995:116; see also Hempel 1952). They are, rather, constructs useful for categorizing objects or events, and for drawing out attributes we think they share. Thus our empirical measures can never be better than approximations, and the literature abounds with "measurements" that draw on something in addition to, or other than, that which they claim to measure - or that are grounded in nothing at all. As Babbie (1995:127-8) explains, we can assess the validity of a measure in several ways. Does it possess face validity - that is, does it have anything to do with the concept we have nominally defined? An index that excludes extortion while counting street crimes might return higher values for places we think are more corrupt, but it does not measure what we mean by "corruption." Does it possess criterion-related or predictive validity, in the sense of predicting changes in other variables that theory tells us should be linked to our concept? For example, corruption measures should statistically "predict" the credit ratings lenders give to various governments. Or, a measure might be related to other variables in ways that are consistent with what we know about those factors, even if it does not "predict" them - an attribute called construct validity. We might, for example, expect extensive corruption where institutions are of poor quality (Knack and Keefer 1995) and ethno-linguistic fragmentation is severe (Easterly and Levine 1996). A measure possessing content validity works well across diverse manifestations of a concept: corruption ratings ought to reflect the incidence of all the major varieties of corruption, not just one or a few. Finally, a concept might have reference-group validity - that is, be judged sound by people with extensive knowledge of whatever we wish to measure.
Reliability refers to the question of whether a particular measure returns consistent results. A corruption scale that rates Zimbabwe (say) as an 8 on a scale of ten one year, 2 the next, and 5 the year after that, is of little use: theory suggests that such wide variations are unlikely. No social-science measure will be completely reliable, but we can improve our results through careful construction of indices using good data, and by repeated testing.
Finally, precision refers to the fineness of the units in which a measure is expressed. In general, the more precision the better: we would have little use for a "yes/no" corruption variable. High-, medium-, and low-corruption categories would be better, and numerical rankings more precise yet. A related issue is level of measurement: some measures are