Judging the reliability of statistical results
Almost without exception, the object of a statistical study is to furnish a basis for generalization. In a case like that discussed in the preceding chapter, for example, no one would be likely to visit 20 farms scattered all over a county simply for the purpose of finding out what the yield of corn was on those particular farms. Instead, he might be studying the yield on those farms as a basis for determining what the average yield of corn was for all the farms in the county. Stated in statistical terms, he would be finding out what was the average yield in a sample of farms, picked at random, with a view to judging what was about the average yield in the universe in which he was interested, that is, all the farms in the county.
Of course it would be possible to visit all the farmers in the county, find out exactly what yield each one obtained, and so get an average of all the yields in the whole county. But this process would not only be expensive but also in most cases would be a pure waste of time and energy. We need only take a large enough sample by a well-designed sampling method to satisfy ourselves to any desired degree of confidence concerning the actual average for all the farms of the county. In this case, 100 records may enable one to determine the average yield quite as accurately as is necessary. Obtaining records from all the several thousand farmers in the county might add nothing to the usefulness of the results.
Before considering ways of finding out how many records would be needed in any given case, we might well discuss a little more fully what the process of statistical inference involves. Really, all that we do is to examine or measure a certain group of objects, and infer from the size or measurement of those objects, or from the way those objects behave, what will be the size of other objects of the same sort, or how other objects