Variance and correlation matrices play a vital role in multivariate statistics. Multivariate statistics studies n x p data from a set of samples, where n is the sample size or the number of measurements and p is the number of variables whose values are obtained from each sample. For instance, suppose that four respondents participated as samples in a research. They were asked to give their data about: 1) volume of internet data consumption per month (in Gigabytes), 2) monthly income (in million Rupiahs), and 3) volume of gasoline consumption for transportation per month (in liters). The sampling results were summarized in the following table.

The data can be presented in a matrix, namely X, as follows.

In general, n measurements on p variables can be presented in an n x p matrix as follows.

Note: Each row of **X** is a multivariate observation.

**The Mean Matrix**

The mean or average of the data on each variable can be obtained from the mean matrix as follows:

where **1 _{n}** is a row matrix with n columns, all of which have a value of 1. So, the mean matrix for the data above is:

From the mean matrix, it can be seen that the average volume of internet data consumption per month is 120 GB, the average monthly income is IDR 9 million, and the average volume of gasoline consumption per month is 22 liters.

**The Deviation Matrix**

The matrix that represents deviations from the mean values is called * deviation matrix*, denoted by T throughout this post. It can be determined by the formula below.

Here I is the identity matrix of order n.

So, the deviation matrix of the data is determined as follows:

**The Sample Covariance Matrix**

We can use the sample covariance matrix **S** to find the sample variance and covariance:

The matrix can also be expressed as:

or

So, for the above data, we have:

The diagonal entries of **S** represent the variances, where S_{ii} denotes the variance of **X _{i}** ; i = 1, 2, 3, …, p.

Consequently, the diagonal entries of **S** are interpreted as follows.

s_{11} = the variance of **X _{1}** = the variance of the volume of internet data consumption per month = 4600 GB

^{2}

s

_{22}= the variance of

**X**= the variance of the monthly income = 6.67 (million IDR)

_{2}^{2}= 6.67⋅10

^{12}IDR

^{2}.

s

_{33}= the variance

**X**= the variance of the volume of gasoline consumption per month = 48 liter

_{3}^{2}

In the covariance matrix **S**, if then s_{ij} represents the covariance between **X _{i}** and

**X**; i, j = 1, 2, 3, …, p). Hence, in the matrix S above:

_{j }s

_{12}= s

_{21}= the covariance between

**X**and

_{1}**X**= 173.33 GB.(million IDR) = 1.7333⋅10

_{2}^{8}GB.IDR

s

_{13}= s

_{31}= the covariance between

**X**and

_{1}**X**= 373.33 GB.liter

_{3}s

_{23}= s

_{32}= the covariance between

**X**and

_{2}**X**= 13.33 (million IDR).liter = 1.333⋅10

_{3}^{7}IDR.liter

**The Correlation Matrix**

The correlation coefficient can be obtained from the following correlation matrix:

where is the inverse matrix of , while the matrix is defined as the follows.

The element of the i-th row and j-th column of the matrix is 0 if and if i = j.

Then, it is easy to check that:

In the example above, the correlation matrix is:

From this correlation matrix, it can be concluded that:

The correlation coefficient between **X _{1} **and

**X**is r

_{2}_{12}= 0.9898. It is the correlation coefficient between the volume of internet data consumption per month and the monthly income. The correlation coefficient between

**X**and

_{1}**X**is r

_{3}_{13}= 0.7945, which is the correlation between the volume of internet data consumption per month and the volume of gasoline consumption per month. The correlation coefficient between

**X**and

_{2}**X**is r

_{3}_{23}= 0.7454. It is the correlation coefficient between the monthly income and the volume of gasoline consumption per month.