In the article Introduction to The Measures in Statistics, it is stated that there are several measures of location such as the mean, median, mode, etc. The article also warns us not to apply the mean if the data being examined are not quantitative. (Regarding quantitative data, please refer to Levels of Measurement) But how if the data are at ordinal level of measurement (qualitative)? Here comes the median as an alternative to the mean. We can use the median if the data being examined are at least at ordinal scale.
There are many versions of how the median is defined. Throughout this post, the median is the midpoint of the values after they have been ordered from the smallest to the largest. As an example, suppose that we are to find the median of some students’ grades on Math. The grades are shown in the table below.
To find the median, the first step is to sort the data from the smallest to the largest. This gives the following.
C – C – C+ – B – B+ – A– – A
The data in the middle is the median of the data, that is B. Please note that 3 students got lower than B and 3 students got higher than B. The median splits the data into two groups of an equal number of data. Half of the data have values less than or equal to the median and the other half have values more than or equal to the median.
What is the advantage of the median when compared to the mean? Consider the following data on the heights of 5 students, in centimeters: 158, 169, 1600, 170, 163. Note that the data have an extreme value, i.e. 1600 cm. This may be due to mistyping when entering the data; 160 was accidentally recorded as 1600. The mean of the data is 452 cm. If all the data were entered correctly, the mean would be 164 cm. So, with the extremely large value, the incorrectly-calculated mean deviates far from the true mean. But, if the median was applied, we would get 169 cm as the median. Either the data were correctly or incorrectly entered, we will get 169 cm as the median. Median is not affected by extremely large or small values.
What is the formula for determining the median? Suppose that we are examining n data at ordinal scale, sorted from the smallest to the largest as follows: X1, X2, X3, …, Xn. (Thus, X1 ≤ X2 ≤ X3 ≤ … ≤ Xn) The median (Q2) of the data is , where .
Example 1
The following are the math quiz marks of 7 students.: 60, 78, 52, 44, 88, 76, 50. Find the median.
Answer
Sort the data from smallest to largest. This gives 44, 50, 52, 60, 76, 78, 88, Let X1 = 44, X2 = 50, X3 = 52, X4 = 60, X5 = 76, X6 = 78, and X7 = 88. In this case, n = 7. Thus, . As a result, the median is Q2 = X4 = 60.
Example 2
Below are sample data on the travel time (in minutes) from home to school of 8 students.
40, 15, 45, 45, 30, 60, 15, 20
Half of the students reach school (from home) less than or equal to how many minutes?
Answer
Sort the data from smallest to largest. This results in the following.
15, 15, 20, 30, 40, 45, 45, 60
Let X1 = X2 = 15, X3 = 20, X4 = 30, X5 = 40, X6 = X7 = 45, and X8 = 60. In this case, n = 8. Thus, . So, the median is X4.5. Since 4 < 4.5 < 5 and 4.5 is exactly halfway between 4 and 5, by linear interpolation we calculate the median as follows.
Consequently, half of the students reach school less than or equal to 35 minutes.
If the data are presented in a frequency distribution table, how to compute the median? The following formula applies.
…………………………………………………………………………………………………………………………………………………………. (*)
where
L2 = the lower class boundary (LCB) of the median class
n = sum of all frequencies (= the number of data)
(Ʃf)2 = the number of data whose values are less than L1
fM = the frequency of the median class
c = the median class’ width
(Regarding LCB and class width, please refer to The Anatomy of Frequency Distribution Tables.)
Note: Median class is the class in the frequency distribution table that contains the kth data, where k = n/2.
Example 3
The frequency distribution table below shows the employees’ monthly expenditure on mobile phone telecommunication. Find the median.
Answer
The first step is to insert an additional column to the right of the the column indicating frequencies, which is the one with the column heading “Number of Employees”. Name the new column “Data Numbers”. Since the first class contains 5 data, the first class’ data numbers are from 1 to 5. The second class contains 13 data, so its data numbers are 6 to 18. Continuing this way, we have the following table.
(We have renamed the third column “Frequency”.)
In this example, the sum of all frequencies is n = 50, thus n/2 = 50/2 = 25. From the entries of Data Numbers column, we know that the 25th data is in the 3rd class with the class interval 176-210. This class is hereinafter referred to as the median class. Then, we calculate the median class’ LCB, i.e. L2 = 176-0.5 = 175.5. The number of data whose values are less than L2 is (Ʃf)2 = 5 + 13 = 18. The frequency of the median class is fM = 20. The median class’ width is c = 210.5-175.5 = 35. Substituting these values into (*), we get:
So, the median of the employees’ monthly expenditure on mobile phone telecommunication is IDR 187,750.
No less important is the meaning of Q2 = IDR 187,750. Theoretically, this means that half of the employees (i.e. 25 employees) expend less than or equal to IDR 187,750 on mobile phone telecommunication.
References
Shukla, M. C., S. S., Gulshan, Elements of Statistics for Commerce Students, S. Chand&Co.(Pvt) Ltd., 1971
Spiegel, M. R., Theory and Problems of Statistics, McGraw-Hill Inc., 1981