Suppose that we have collected some sample data and presented them in a frequency distribution table with equal class widths, As a consequence, we can apply the coding method to calculate the variance. By this method, the variance can be obtained without first calculating the mean. This method makes use of the following sample variance formula.
where s2 = sample variance, ui = the code of class i, fi = the frequency of class i, k = the number of classes, n = the sample size, c = the class width.
Suppose, we have the following sample data on the duration of the advertisements on a private television station in 2015.
Table 1
How to determine the variance of the sample data without calculating the mean?
First, we add a column on the right, namely codes column.
Table 2
Then, assign 0 (zero) to one of the ui‘s. Arbitrarily, we select u4 to be assigned 0. Thus, we have the following table.
Table 3
To other ui’s, assign consecutive integers such that ui+1 = ui + 1. Then, we have the following table.
Table 4
Now, insert three additional columns to the right of the ui column. They are for ui2, fi⋅ui, and fi⋅ui2, respectively. Fill in the empty columns, by entering the proper calculation results determined by the algebraic operation shown at the top of each column. Consequently, we have:
Table 5
Calculate the sum of entries in the fi, fi⋅ui, and fi⋅ui2, columns. This gives:
Table 6
From the table, we have , , and .
The class width (c) equals to the difference between two lower boundaries of any consecutive classes. If LCLi denotes the lower limit of class i then c = LCLi+1 – LCLi. Alternatively, the difference between two consecutive upper class limits (c = UCLi+1 – UCLi) will give the same result. In this example, we may calculate c = 37 – 30 = 7 or c = 43 – 36. Class width can also be determined by substracting the class’ lower boundary (LCB) from its upper boundary (UCB), that is c = UCB – LCB. In this example, the class width of the first class is c = 36.5 – 29.5 = 7. The second class’ width is c = 43.5 – 36.5 = 7. Based on the assumption that all the classes have equal width, such calculation will give the same result, not depending on the class chosen as the basis for the calculation. So, before applying this method, make sure that all classes have equal class width! Applying this method to a frequency distribution table with unequal class widths will give incorrect result! This is a weakness of coding method for calculating variance: it must not be applied if there are some classes of unequal class widths!
Substitute the values that have been obtained above into the sample variance formula. It follows that:
Previously, we set u4 = 0. The equal variance would result if we assigned zero to another class code. For instance, if we set u5 = 0, we would get the following table.
Table 7
Substituting the appropriate values to the sample variance formula, we have the following.
If, instead of sample data, we have population data (i.e. the ones that consist of the whole population under study) we have to apply the population variance formula given below.
where σ2 = the population variance, ui = the code of class i, fi = the frequency of class i, k = the number of classes, N = the population size, c = the class width.
Assuming that the data at the beginning of this post are population data, then the variance is calculated as follows.
Table 8
(Table 8 is a copy of Table 6.)
Substituting the appropriate values to the population variance formula, we have the following.
sec2 ≈ 139.63 sec2.