Boxplots are constructed to present quantitative data. They are created based on the five statistics: the first quartile (Q1), the third quartile (Q3), the median, the minimum value, and the maximum value.

Suppose that we have data on the weights of 44 students (in kilograms) as shown in the table below.

The five statistics are:

The minimum value = 45 kg
The first quartile = Q1 = 55 kg
The median = 60 kg
The third quartile = Q3 = 65 kg
The maximum value = 73 kg

The corresponding boxplot of the data is presented below.

Figure 1

 

In Figure 1, the bottom of the box shows the first quartile (Q1) = 55 kg. The top of the box shows the third quartile (Q3) = 65 kg. The horizontal line between the bottom and the top of the box indicates the median = 60 kg. Also, there is a vertical line segment that connects the bottom of the box with the short horizontal line segment below the box. The short horizontal line segment below the box indicates the minimum value of the data, i.e. 45 kg. In addition, there is a vertical line segment that connects the top of the box with the short horizontal line segment over the box. The short horizontal line segment over the box indicates the maximum value of the data, i.e. 73 kg.

 

Outliers in Boxplot
The boxplot in Figure 1 is an example of a boxplot of some data without any outliers. In this post, an outlier is the data whose value is less than (Q1 – 1.5 IQR) or greater than (Q3 + 1.5 IQR) where IQR = (Q3 – Q1). (IQR stands for interquartile range). In the example above, IQR = 65 kg – 55 kg = 10 kg. Therefore:
Q1 – 1.5 IQR = 55 kg – 1.5(10 kg) = 40 kg
Q3 + 1.5 IQR = 65 kg + 1.5(10 kg) = 80 kg
It can be seen that all the data above are in the range of 40 kg to 80 kg. So they have no outliers.

 

If a set of data has some outlier, then in constructing the boxplot, the lower vertical line segment connects the bottom of the box with the short horizontal line segment representing the smallest data that is not an outlier. Similarly, the upper vertical line segment connects the top of the box with the short horizontal line segment representing the largest data that is not an outlier. For example, consider the following data.

The five statistics of the data are:
The minimum value = 149
The first quartile = Q1 = 160
The median = 162
The third quartile = Q3 = 166.75
The maximum value = 174

 

Before constructing the boxplot, we have to check whether there is an outlier.

The interquartile range of the data is IQR = Q3 – Q1 = 166.75 – 160 = 6.75. Furthermore:
Q1 – 1.5 IQR = 160 – 1.5(6.75) = 149.875
Q3 + 1.5 IQR = 166.75 + 1.5(6.75) = 176.875
Thus, the data whose value is less than 149.875 or greater than 176.875 are outliers. Note that the data have an outlier, that is 149. So, the smallest data that is not an outlier is 154 and the largest data (which is not an outlier) is 174. Consequently, the boxplot of the data is shown in Figure 2.

Figure 2

 

Note that in Figure 2, the vertical line segment below the box “stops” at 154, not at the smallest data, i.e. 149. It is because 154 is the smallest data which is not an outlier. The vertical line segment above the box “stops” at 174 as the maximum value. (Note that 174 is not an outlier.) The boxplot also shows a small circle below the horizontal line segment representing 154. The small circle indicates the outlier, i.e. 149.

Leave a Reply

Your email address will not be published. Required fields are marked *