# The Use of the Range and Mean Deviation in Interpreting the Standard Deviation

Rhiel, G. Steven, Akron Business and Economic Review

The Use of the Range and Mean Deviation in Interpreting the Standard Deviation

With the emergence of sophisticated MIS systems, today's businessmen and women have more access to statistical data than ever before. Many of these individuals have difficulty interpreting and, consequently, using many of the statistics provided on computer printouts. The standard deviation(1) is one of these statistics. Although it is commonly used, it is difficult to interpret because of its mathematical complexity.

Lay people, in particular, have a difficult time understanding the standard deviation.(2) For example, the manager of a credit union who received reports that contain the mean and standard deviation of loan amounts may have problems explaining these statistics to members of the credit union governing board. If in a particular report the mean and standard deviation of the loans were $5000 and $400, respectively, the average loan of $5000 could be easily understood by the board members, but the standard deviation of $400 may not - even though it may be very important in analyzing the overall loan situation. One can imagine many situations in business similar to this where an intuitive understanding of the standard deviation may be useful.

One method to facilitate the interpretation of the standard deviation is to relate it to the range. For an infinite, normal population, three standard deviation units above and below the mean encompass 99.73 percent of the distribution, which results in the range equaling approximately six standard deviations. A similar technique for interpreting the standard deviation for a sample would be very useful, since the sample standard deviation is commonly found on business reports.

According to McNemar [2], for samples from the normal distribution, the range is equal to five standard deviations when n (the sample size) is 50, six standard deviations when n is 200, and seven standard deviation units when n is 1000. Pearson and Stephens [4] provide 12 ratios for the normal distribution (mathematically derived) for sample sizes from three to 100. For a sample size of fifty, Pearson and Stephens' ratio is 4.4212 compared with McNemar's 5.000.

Information on the relationship (or ratio) of the range to standard deviation for samples from non-normal distributions is scarce. Baker [1] compared ratios from the normal distribution to ratios from the platykurtic-bimodal and skewed-bimodal distributions for sample sizes 64 and 100. He found that the ratios for the platykurtic-bimodal distribution are smaller than those for the normal distribution: 4.1349 to 4.8272 when n is 64 and 4.4864 to 5.1214 when n is 100. For samples from the skewed-bimodal distribution, the ratios differ minimally from those for the normal distribution. Additional research to determine the extent to which these ratios vary when the population shape becomes non-normal may be critical in establishing the range as an aid in explaining the standard deviation.

Another approach to interpreting or explaining the standard deviation (S) is to describe it in terms of the mean deviation (MD). The mean deviation is easily understood since it is the average of the absolute deviations about the mean or, more plainly stated, the average amount the observations vary from the mean. Because of its simplicity, if the sample mean deviation is approximately equal to the sample standard deviation, regardless of the sample size or population shape, the meaning of the standard deviation to lay people would be clearer.

McNemar [2] states that the relationship of the mean deviation to the standard deviation for the normal distribution is MD = .798S. However, no research is available to substantiate whether this is true for all sample sizes, whether from normal or non-normal distributions.

The purpose of this study is to determine the relationship of the range (W) and the mean deviation (MD) to the standard deviation (S) for various sample sizes from various shaped distributions. These relationships are used to facilitate the interpretation or explanation of the standard deviation.

COMPUTER SIMULATION

The relationship of the sample range and sample mean deviation to the sample standard deviation is investigated by using computer simulation techniques. (Computer simulation is used because it is mathematically infeasible to derive these relationships theoretically.) Sampling distributions for the sample range (W), the sample mean deviation (MD), and the sample standard deviation (S) for various sample sizes from each of nine distributions are computer generated. One thousand simulations are used to build each sampling distribution.

These sampling distributions are generated for sample sizes two to 10 by increments of two, sample sizes 10 to 60 by increments of 10, sample sizes 60 to 100 by increments of 20, sample sizes 200 to 1000 by increments of 150, and for sample sizes 1500 and 2000. The ratios of the mean sample-range [Mathematical Expression Omitted] and mean sample-mean-deviation [Mathematical Expression Omitted] to the mean sample-standard-deviation [Mathematical Expression Omitted] are determined for each of the sample sizes from the nine distributions.

The algorithm for generating the random numbers used in the simulation is stored in the DEC computer library. This algorithm is taken from an article by Payne, Rabung, and Bogyo [3].

DEFINITIONS

The remaining sections of this article contain several technical terms that are used to describe distribution shape. The following definitions clarify these terms.

Skewness - A distribution is either symmetrical or asymmetrical. Skewness is

the property of asymmetry of a distribution. One measure of the degree of skewness is [beta.sub.1]. If [beta.sub.1] [is equal to] 0, the distribution is symmetrical. If [beta.sub.1] > 0, the distribution is skewed to the right (i.e., t he distribution has a large number of observations clustered toward the left with a long tail to the right). If [beta.sub.1] < 0, the distribution is skewed to the left.

Kurtosis - Kurtosis is a measure of the peakedness or flatness of a distribution.

One measure of the degree of kurtosis is [beta.sub.2]. If [beta.sub.2 ] = 3, the distribution is neither peaked nor flat topped ([beta.sub.2] = 3 for the normal distribution). If [beta.sub.2] > 3, the distribution is peaked with l ong tails. If [beta.sub.2] < 0, the distribution is flat with short tails.

Leptokurtic - Leptokurtic is a condition of kurtosis resulting in a peaked distribution

([beta.sub.2] > 3).

Platykurtic - Platykurtic is a condition of kurtosis resulting in a flat distribution

([beta.sub.2] < 3).

THE DISTRIBUTIONS SAMPLED

The nine distributions utilized in this study are described in this section in terms of their mean ([mu]), standard deviation ([sigma]), skewness ([beta.sub.1]), and kurtosis ([beta.sub.2]). These distributions were chosen for this study in order to ensure a wide range of kurtosis, since kurtosis controls the distribution of the range (see Singh, [6]).

1. Normal Distribution (N): Sample values from the normal distribution

are generated using algorithms obtained from Pritscher [5] such that

[mu] = 0.000, [sigma] = 1.000, [beta.sub.1] = 0.000, and [beta.sub.2] = 3.000.

2. Extremely Leptokurtic Distribution (EL): Sample values are generated

from a Chi Square distribution with 2 degrees of freedom such that

[mu] = 2.000, [sigma] = 2.000, [beta.sub.1] = 2.000, and [beta.sub.2] = 9.000.

3. Leptokurtic Distribution (L): Sample values are generated from a Chi

Square distribution with 4 degrees of freedom such that [mu] = 4.000,

[sigma] = 2.878, [beta.sub.1] = 1.410, and [beta.sub.2] = 6.000.

4. Slightly Leptokurtic Distribution (SL): Sample values are generated from

a Chi Square distribution with 8 degrees of freedom such that [mu] =

8.000, [sigma] = 4.000, [beta.sub.1] = 4.000, and [beta.sub.2] = 5.400.

5. Slightly Platykurtic Distribution (SP): Sample values are generated from

a slightly platykurtic distribution such that [mu] = 3.200, [sigma] = 1.120, [beta.sub.1] =

0.000, and [beta.sub.2] = 2.7263.

6. Platykurtic Distribution (P): Sample values are generated from a platykurtic

distribution such that [mu] = 3.142, [sigma] = 1.367, [beta.sub.1] = 0.000,and

[beta.sub.2] = 2.194.

7. Bimodal, Platykurtic Distribution (B): Sample values are generated

from a bimodal distribution such that [mu] = 1.571, [sigma] = 0.746, [beta.sub.1] = 0.000,

and [beta.sub.2] = 1.932.

8. Uniform, Platykurtic Distribution (UP): Sample values are generated

from a uniform distribution such that [mu] = 0.500, [sigma] = 0.289, [beta.sub.1] = 0.000,

and [beta.sub.2] = 1.800.

9. U-shaped, Platykurtic Distribution (US): Sample values are generated

from a U-shapred distribution such that [mu] = 0.000, [sigma] = 0.647, [beta.sub.1] =

0.000, and [beta.sub.2] = 1.545.

INVESTIGATIONS

Investigation 1: The Relationship of W to S

The ratio of the mean sample-range [Mathematical Expression Omitted] to the mean sample-standard-deviation [Mathematical Expression Omitted] ratios) for various sample sizes from each of the nine distributions are presented in Table 1. The [Mathematical Expression Omitted] ratios express how many mean standard deviations equal the mean range. For example, 4.517 standard deviations equal the range for a sample of size 50 from the normal distribution. The [Mathematical Expression Omitted] ratios for the normal distribution vary vrom 1.4312 for sample size two to 6.898 for sample size 2000. The [Mathematical Expression Omitted] ratios for the normal distribution in this study are in agreement to the second decimal place with those of Pearson and Stephens [4].

As the distributions become leptokurtic, the [Mathematical Expression Omitted] ratios decrease minimally from the [Mathematical Expression Omitted] ratios for the normal distribution for the smaller sample sizes and increase substantially for the larger sample sizes. The [Mathematical Expression Omitted] ratios consistently decrease as the distribution changes from normal to platykurtic. However, the decrease for the smaller sample sizes is not as extensive as the decrease for the larger sample sizes. [Tabular Data Omitted]

Investigation 2: The Relationship of MD to S

Table 2 contains the ratio of the mean sample-mean-deviation to the mean sample-standard-deviation [Mathematical Expression Omitted] for the various sample sizes for the nine distributions. The [Mathematical Expression Omitted] ratios can be interpreted as: the mean deviation is that proportion of the standard deviation; or, by moving the decimal point two places to the right, the mean deviation is that percent of the standard deviation. From Table 2, for a sample size of 750 from the normal distribution, the mean deviation is 79.7 percent of the standard deviation. If a sample of size 100 were drawn from a U-shaped distribution, the mean deviation would be 90 percent of the standard deviation.

The [Mathematical Expression Omitted] values are consistently between .75 and .80 (excluding sample size 2) for the leptokurtic and normal distributions. With the platykurtic distributions, the [Mathematical Expression Omitted] ratios range from .75 to .90 (excluding sample size 2). For samples of size 1000 or more from the normal distribution, the [Mathematical Expression Omitted] ratios from this research are all .798, the same as McNemar's [2] ratio. [Tabular Data Omitted]

CONCLUSIONS

Interpreting the sample standard deviation by stating that six standard deviations equal the range is inappropriate for most cases. The ratio of the range to standard deviation varies considerably depending on sample size and distribution shape. The instability of the W/S ratios makes them difficult to use to clarify the standard deviation when the distribution shape is not known.

One suggestion for using the range (W) to interpret the standard deviation (S) is to use the W/S ratios for samples from the normal distribution. Several of these are reported in McNemar's text [2]. However, McNemar's ratios appear to be erroneous and should be corrected as follows: the ratio of the range to standard deviation for samples from the normal distribution is approximately five when n = 100 (instead of n = 50), six when n = 500 (instead of n = 100), and seven when n = 2000 (instead of n = 1000). In addition, it may be worthwhile to include the following ratios:three when n = 10 and four when n = 30. If additional sample sizes are desired, they can be obtained from Table 1. The above values can be used for a general interpretation of the standard deviation.

For a specific interpretation of the standard deviation, let us use the example in the introduction concerning the credit union. If the mean and standard deviation of $5000 and $400, respectively, were calculated from a sample of 100 loans, the manager could explain to the governing board that if the distribution of loans is normal, five standard deviations equal the range of the loans (or the standard deviation is one-fifth of the range of the loans).(3)

The use of the range for interpreting the sample standard deviation provides a vast improvement over using the ratio of range to standard deviation of one to six, as is often done in practice.

The mean deviation should be extremely useful as an aid in interpreting (clarifying) the standard deviation. The mean deviation is 75 to 90 percent of the standard deviation (see Table 2). The stability of this relationship as the shape of the population varies should render the mean deviation as very helpful in clarifying the standard deviation. An explanation similar to the following could be used to explain the standard deviation by using the mean deviation:if the mean deviation and the standard deviation are both calculated from a set of data, the mean deviation, which is the average distance of the observations from the mean, is 10 to 25 percent less than the standard deviation.

In returning to the credit union example, the standard deviation of 400 could be interpreted by saying that the loan amounts vary from the mean of $5000 by slightly less than an average of $500. More specifically, if ten loans were used in determining these statistics and the distribution of loans was normal, the average amount the loans deviate from the mean loan is approximatley .775 ($500) = $387.50 (calculated from .775S = MD).(4)

In summary, the ratio of the range to standard deviation provides an improved method of interpreting the sample standard deviation over using the ratio of the range to standard deviation for the normal, infinite population. The author suggests using several W/S values for samples from the normal distribution for interpeting the standard deviation. The mean deviation is a measure that gives businessmen and women a means of comparing the standard deviation with a less abstract measure of dispersion, thus making it easier to understand. (1) The standard deviation referred to in this study is the root-mean-square estimator of [sigma].

(2) The author has taught statistics to graduate and undergraduate business students for twelve years and has observed the difficulty people have in understanding and interpreting the standard deviation. (3) This ratio is subject to sampling error, and, for any particular sample, the W/S ratio in the table may differ from that for the sample. (4) The ratio of the mean deviation to standard deviation is subject to sampling error, and, for a particular sample, the MS/S value from the table may differ from that for the sample.

REFERENCES

[1.] Baker, G.A. "Distribution of the Ratio of Sample Range to Sample Standard Deviation for Normal

and Combinations of Normal Distributions." Annals of Mathematical Statistics, 17 (1946),

366-69. [2.] McNemar, Quinn. Psychological Statistics, 4th ed., New York: John Wiley and Sons, Inc., 1969. [3.] Payne, W. H., J. R. Rabung, and T. P. Bogyo. "Coding the Lehmer Pseudo Random Number

Generator." Communications of the ACM, 12, 2 (February, 1969), 85-86. [4.] Pearson, E. S., and M.A. Stephens. "The Ratio of Range to Standard Deviation in the Same Normal

Sample." Biometrika, 51, parts 3 and 4 (December, 1964), 484-87. [5.] Pritscher, A.A.B. The Gasp IV Simulation Language. New York: John Wiley and Sons, Inc., 1974. [6.] Singh, C. "Moments of the Range of Samples from Nonnormal Populations." Journal of the

American Statistical Association, 71, number 356 (December, 1976), 988-91.

G. STEVEN RHIEL is Associate Professor of Management Information Systems/Decision Sciences at Old Dominion University.

…## The rest of this article is only available to active members of Questia

Sign up now for a free, 1-day trial and receive full access to:

- Questia's entire collection
- Automatic bibliography creation
- More helpful research tools like notes, citations, and highlights
- A full archive of books and articles related to this one
environment*Ad-free*

Already a member? Log in now.