A principal component analysis is concerned with explaining the variance-covariance structure of a set of variables through a few linear combinations of these variables. Hereinafter, each linear combination is referred to as a * component*. The number of components can be selected or set so that the total variance produced by these components is almost equal to the total variance of the original variables. Thus, the information in the components is almost as much as the information in the original variables. In addition, the components derived are orthogonal to each other. In other words, these components are not correlated with each other.

The resulting components are rarely treated as the ultimate objective in multivariate statistics. These components are often required when applying other multivariate statistical analysis such as multiple regression, cluster analysis, and factor analysis.

Suppose that the random vector has the covariance matrix **Σ** with eigenvalues λ_{1} ≥ λ_{2} ≥ … ≥ λ_{p} ≥ 0. Consider p linear combinations below.

Therefore,

Var(Y_{i}) = ; i = 1, 2, …, p

Cov(Y_{i},Y_{k}) = ; i, k = 1, 2, …, p

The * principal components* are those uncorrelated linear combinations Y

_{1}, Y

_{2}, …, Y

_{p}with the property that for every i ∈ {1, 2, …, p} Var(Y

_{i}) is as large as possible.

By the definition of Y_{i}, the arising problem is that Var(Y_{i}) can be made as large as possible by multiplying by some constant. To eliminate this indeterminacy, a new condition is added: must be a unit vector. Therefore the principal components are defined as follows.

First principal component = linear combination that maximizes subject to .

Second principal component = linear combination that maximizes subject to and .

At the *i *th step,

*i *th principal component = linear combination that maximizes subject to and for k < i.

**Theorem 1**

Let Σ be the covariance matrix associated with the random vector . Also suppose that Σ has the eigenvalue-eigenvector pairs where λ_{1} ≥ λ_{2} ≥ … ≥ λ_{p} ≥ 0. Then, the *i *th principal component is as follows:

= e_{i1}X_{1} + e_{i2}X_{2} + … + e_{ip}X_{p} for i = 1, 2, …, p

Further consequences:

; i = 1, 2, …, p

; i ≠ k.

If some λ_{i} are equal, the choices of the corresponding coefficient vectors, (and hence Y_{i}) are not unique.

**Example**

Suppose that the random vector has the covariance matrix below.

To determine the principal components, first calculate the eigenvalues and the corresponding eigenvectors. The eigenvectors are so selected such that their norms are 1. The eigenvalues (ordered from the largest to the smallest) and their corresponding eigenvectors are as follows.

According to Theorem 1, the principal components are:

Y_{1} = -0.229 X_{1} + 0.622 X_{2} + 0.197 X_{3} – 0.722 X_{4}

Y_{2} = 0.861 X_{1} – 0.028 X_{2} – 0.328 X_{3} – 0.387 X_{4}

Y_{3} = 0.447 X_{1} + 0.477 X_{2} + 0.618 X_{3} + 0.437 X_{4}

Y_{4} = -0.075 X_{1} + 0.620 X_{2} – 0.687 X_{3} + 0.371 X4

Also, by Theorem 1:

Var(Y_{1}) = λ_{1} = 71.224

Var(Y_{2}) = λ_{2} = 31.511

Var(Y_{3}) = λ_{3} =14.343

Var(Y_{4}) = λ_{4} = 2.923

**Theorem 2**

Suppose that the random vector has the covariance matrix Σ with eigenvalue-eigenvector pairs and λ_{1} ≥ λ_{2} ≥ … ≥ λ_{p} ≥ 0. Let be the principal components. Then, the sum of the variances of X_{1}, X_{2}, …, X_{p} is equal to the sum of the variances of Y_{1}, Y_{2}, …, Y_{p}.

Based on one of the consequences in Theorem 1, i.e. for i = 1 , 2, …, p, Theorem 2 deduces the following.

In the example above, = λ_{1} + λ_{2} + λ_{3} +λ_{4} = 71.224 + 31.511 + 14.343 + 2.923 = 120.001. The sum of the diagonal elements of matrix Σ is nothing but = σ_{11} + σ_{22} + σ_{33} + σ_{44} = 30 + 32 + 13 + 45 = 120. This is in accordance with the conclusion of Theorem 2.

**Theorem 3**

If are the principal components obtained from the covariance matrix Σ, then the correlation coefficient between the component Y_{i} and the variable X_{k} is for i, k = 1, 2, …, p where are eigenvalue-eigenvector pairs of Σ.

As an example of how to apply Theorem 3, suppose that we are to find the correlation between Y_{4} and X_{1}. From the equation Y_{4} = -0.075 X_{1} + 0.620 X_{2} – 0.687 X_{3} + 0.371 X_{4} we have e_{41} = -0.075. By applying Theorem 1, we have λ_{4} = 2.923. From the covariance matrix, we get σ_{11} = 30. Furthermore, by Theorem 3 we get . Similarly, the correlation between Y_{4} and X_{2} is .

To measure the importance of variable X_{k }in component Y_{i}, some statisticians use e_{ik} while others use . One of the reasons for not using is that it only measures the univariate contribution of an individual X to a component Y. That is, they do not indicate the importance of an X to a component Y in the presence of the other X’s. In particular, Rencher in Johnson and Wichern (2002) recommends using e_{ik} instead of to interpret the components. However, Johnson and Wichern (2002) stated “Although coefficients and the correlations can lead to different rankings as measures of the importance of the variables to a given component, it is our experience that these rankings are often not appreciably different.” and they have recommended that both e_{ik} and be examined to help interpret the principal components.

At the beginning of this article, it was mentioned that principal component analysis produced new variables (called components) which were fewer than the original variables but retained as much of the total variance of the original variables as possible. By retaining most of the variability in the original variables, the resulting components can replace the old variables. This can be demonstrated as follows.

In the example above, suppose that we take only two components, namely Y_{1} and Y_{2}. What fraction of the total variance of the original variables is retained by Y_{1} and Y_{2} altogether? The proportion of the total population variance retained by the first principal component, i.e. Y_{1}, is = = = 59.35%. The proportion of the total population variance retained by the second component, Y_{2}, is = = = 26.26%. As a consequence, if we use only two components to replace the original variables, the proportion of the total variance preserved by the two components is 59.35% + 26.26% = 85.61%. Thus, we can replace X_{1}, X_{2}, X_{3}, X_{4} with two components Y_{1} and Y_{2}. A consequence of this replacement is that most (85.61%) of the total variance is retained.

**Reference**

Johnson, R. A. & Wichern, D. W. (2002). Applied Multivariate Statistical Analysis (5th ed.). Pearson Education

International.