There are two types of measures in statistics, namely measures of location and measures of dispersion. Measures of location are representatives of a data set while measures of dispersion measure the extent to which the data vary.

 

Arithmetic Mean as A Measure of Location
Let X1, X2, X3, …, Xn be the values of n quantitative data*). The average or arithmetic mean of the data, denoted by μ, is defined as follows.
\mu = \frac{X_{1}+X_{2}+X_{3}+...+X_{n}}{n}.
Arithmetic mean indicates the “location” or “center” of the data. We can view the mean as a value that represents a collection of data. There are other measures of location, such as median, mode, quartiles, deciles, percentiles, but they will be discussed in other posts.

 

Example 1
Consider the following groups of some hypothetical data on the grades in a math exam achieved by some students.
Group A: 80, 85, 90, 95, 100
Group B: 30, 35, 40, 45, 50

We may (intuitively) conclude that the data in Group A are the grades of “well-performed” students in the field of mathematics and the data in Group B are that of “badly-performed” ones. Assuming that the data are at interval level of measurement, and therefore they are quantitative, the arithmetic means of the data are:

{\mu}_A = \frac{80+85+90+95+100}{5} = 90

{\mu}_B = \frac{30+35+40+45+50}{5} = 40

 

The “well-performed” students (Group A) achieve a higher “measure of location”. In general, by comparing the means of the groups under study we can conclude which is “better” or “faster” or “more satisfying”, etc. among the groups.

Note:

Regarding Example 1, the proper measure of location to use is the median because the data are at ordinal level of measurement. We should have not applied the arithmetic mean in this case.  We used it by assuming that the data were quantitative.

 

Some Measures of Dispersion
Suppose that instead of the data provided above, the grades on the math exam are as follows.
Group A: 50, 60, 70, 80, 90
Group B: 70, 70, 70, 70, 70
It is easy to check that the arithmetic means are equal, i.e. 70. But the grades in Group B do not vary! The problem here is that the mean alone cannot “detect” variability. So, we need some other measures which are capable of measuring the variability that exists in the data. The measures are usually called measures of dispersion. They include the range, variance, standard deviation, quartile deviation, and mean deviation. The measures require that the data under study are quantitative*). To calculate the range, denoted by R, we simply subtract the smallest data (Xmin) from the largest data (Xmax): R = X_{max} - X_{min}. Thus, in Group A, the range is R = 90 – 50 = 40 and in Group B, the range is R = 70 – 70 = 0. The range of the data in Group A is greater than that of Group B. This indicates that there is more variability in Group A than in Group B. Moreover, as the range of the data in Group B is zero, we can conclude that there is no variability in Group B’s data.

 

Another measure of dispersion is variance. It is defined as follows.

\sigma^2 = \frac{\sum_{i=1}^{n} (X_{i} - \mu)^2}{n}.

In Group A, the number of data is n = 5 and it follows that \mu = 70. Assuming X1 = 50, X2 = 60, X3 = 70, X4 = 80, and X5 = 90, we have:

 

Standard deviation is another measure of dispersion. It is simply the square root of variance, that is, \sigma = \sqrt{\sigma^2}.

 

So, the standard deviation of the data in Group A is \sigma = \sqrt{200} \approx 14.14. As you may verify, the variance and standard deviation of the data in Group B are zero! The variance and standard deviation of the data in Group A is greater than that of Group B. This indicates that there is more variability in Group A than in Group B.

 

In the case of the grades on the math exam above, it has been assumed that each group is comprised of the whole population (of students). In other words, the data have been viewed as population. The variance formula \sigma^2 = \frac{\sum_{i=1}^{n} (X_{i} - \mu)^{2}}{n} holds if we are dealing with population data. The formula to calculate the variance of sample data is as follows.
s^2 = \frac{\sum_{i=1}^{n} (X_{i} - \bar{X})^2}{n-1}, where \bar{X} = \frac{X_{1}+X_{2}+X_{3}+...+X_{n}}{n}.

The example below shows the case in which the sample variance formula must be used.

 

Example 2
The Cambridge Power and Light Company selected a random sample of 10 residential customers. Following are the amounts, to the nearest dollar, the customers were charged for electrical service last month: 54, 48, 58, 50, 25, 47, 75, 43, 60, 70. Compute the variance and the standard deviation.

Answer
The sample mean of the data is \bar{X} = \frac{54+48+58+ \cdots +70}{10} = 53.
Apply the sample variance formula:
s^2 = {\$}^2 \: \frac{(54-53)^2+(48-53)^2+(58-53)^2+ \cdots +(70-53)^2}{10-1} \approx {\$}^2 \: 200.22.

So, the variance is $2 200.22.

 

Example 3

Referring to Example 2, calculate the standard deviation.

Answer

Standard deviation is the square root of the variance. In this case, s = \sqrt{s^2}.

s = \$  \sqrt{200.22} \approx \$ 14.15.

 

Note:

In Examples 2 and 3, we have used the arithmetic mean, variance, and standard deviation. They are proper measures since the data are quantitative.

 

PROBLEM
A researcher in Communication Science is going to compare the contents of two local newspapers. He has got 10 and 6 editions of newspaper A and B, respectively. Editions of newspaper A shows the following percentages of gossip contents: 15, 18, 20, 16, 20, 16, 17, 13, 15, 20 and newspaper B shows the following percentages: 25, 28, 25, 27, 31, 26.
a) Calculate the mean percentage of gossip contents of each newspaper. Which newspaper contains more gossip contents? b) Calculate the range of the percentages. c) Calculate the variances and standard deviations of the percentages. Which newspaper varies more in gossip contents?

 

*) quantitative data are data at interval or ratio level of measurement

Leave a Reply

Your email address will not be published. Required fields are marked *