Problem 1

Find the population principal components Y1 and Y2 for the covariance matrix \Sigma = \begin{pmatrix}5 & 2 \\ 2 & 2 \end{pmatrix}. Then, calculate the proportion of the total population variance explained by the first principal component.

Answer
Find the eigenvalues ​​and the corresponding eigenvectors.

(λ-5)(λ-2) – (-2)(-2) = 0

2 – 7λ + 10) – 4 = 0

λ2 – 7λ + 6 = 0

(λ-6)(λ-1) = 0

λ1 = 6 and λ2 = 1

From λ = 6 we get the eigenvector \vec{e}_1 = \begin{pmatrix}2/\sqrt{5} \\ 1/\sqrt{5} \end{pmatrix} and from λ = 1 we get the eigenvector \vec{e}_2 = \begin{pmatrix}1/\sqrt{5} \\ -2/\sqrt{5} \end{pmatrix}.

Determine the population principal components.

First principal component: Y_1 = \frac{2}{\sqrt{5}}X_1+\frac{1}{\sqrt{5}}X_2
Second principal component: Y_2 = \frac{1}{\sqrt{5}}X_1-\frac{2}{\sqrt{5}}X_2

The proportion of the total population variance explained by the first principal component:
\% Var(Y_1) = \frac{6}{6+1} \cdot 100 \% \approx 85.71 \%

 

Problem 2
Convert the covariance matrix in Problem 1 to a correlation matrix ρ.

  1. Determine the principal components Y1 and Y2 from ρ and compute the proportion of total population variance explained by Y1.
  2. Compare the components calculated in Part 1 with those obtained in Problem 1. Are they the same? Should they be?
  3. Compute the correlations \rho_{Y_1,Z_1} and \rho_{Y_2,Z_1}.

Answer

Part 1
The correlation matrix of Σ is:

Find the eigenvalues ​​and eigenvectors.

The eigenvalue λ1 gives the eigenvector \vec{e}_1 = \begin{pmatrix}1/\sqrt{2} \\ 1/\sqrt{2} \end{pmatrix} while λ2 yields \vec{e}_2 = \begin{pmatrix}-1/\sqrt{2} \\ 1/\sqrt{2} \end{pmatrix}.

First principal component: Y_1 = \frac{1}{\sqrt{2}}X_1+\frac{1}{\sqrt{2}}X_2

Second principal component: Y_2 = \frac{-1}{\sqrt{2}}X_1+\frac{1}{\sqrt{2}}X_2

The proportion of total population variance explained by Y1:

Part 2

The principal components obtained from Σ and ρ are not the same. In general, the two matrices produce different eigenvalues ​​and eigenvectors.

Part 3

Find the correlations \rho_{Y_1,Z_1} and \rho_{Y_2,Z_1}:

The formula used to calculate the correlation between the component Yi and the original variable Xk is \rho_{Y_i,X_k}=\frac{e_{ik} \sqrt{\lambda_i}}{\sqrt{\sigma_{kk}}} (see Theorem 3 in the article: Population Principal Components). However, since the correlation matrix is ​​used as the basis for determining the principal components, σ11 = σ22 = 1, thus in this case \rho_{Y_i,Z_k}= e_{ik} \sqrt{\lambda_i}.

 

Problem 3

Let \Sigma = \begin{pmatrix}2 & 0 & 0 \\ 0 & 4 & 0 \\ 0 & 0 & 4 \end{pmatrix}. Determine the principal components Y1, Y2, and Y3. What can you say about the eigenvectors (and principal components) associated with eigenvalues that are not distinct?

Answer
The characteristic equation of Σ is (λ-2)(λ-4)2 = 0 and this gives the eigenvalues ​​λ1 = λ2 = 4 and λ3 = 2.

The eigenvector obtained from λ3 = 2 is \vec{e}_3 = \begin{pmatrix}1 \\ 0 \\ 0 \end{pmatrix}. The resulting eigenspace from λ1 = λ2 = 4 has the vectors \vec{e}_1 = \begin{pmatrix}0 \\ 1 \\ 0 \end{pmatrix} and \vec{e}_2 = \begin{pmatrix}0 \\ 0 \\ 1 \end{pmatrix} as the basis vectors.

From these results, we obtain the following principal components.
Y1 = X2
Y2 = X3
Y3 = X1

Below, we show that the principal components are not unique.
Note that \vec{u}_1 = \begin{pmatrix}0 \\ 3 \\ 2 \end{pmatrix} and \vec{u}_2 = \begin{pmatrix}0 \\ -1 \\ 3 \end{pmatrix} are two independent vectors in E1, hence they collectively form a basis for E1. By applying the Gram-Schmidt process, the orthonormal basis vectors \vec{v}_1 and \vec{v}_2 can be determined as follows.

From \vec{v}_1, \vec{v}_2, and \vec{e}_3 we get the principal components that differ from the ones previously-obtained, i.e.:

\\ Y_1 = \frac{3}{\sqrt{13}} X_2 + \frac{2}{\sqrt{13}} X_3 \\ Y_2 = \frac{-2}{\sqrt{13}} X_2 + \frac{3}{\sqrt{13}} X_3 \\ Y_3 = X_1

Note that other than the pair of \vec{v}_1 and \vec{v}_2, there are infinitely other pairs of basis vectors for E1 and those other pairs of basis vectors will produce other principal components as well. In conclusion, if there are eigenvalues that are not distinct ​​then the principal components corresponding to the eigenvalues ​​are not unique.

 

Problem 4
Determine the principal components and the proportion of the total population variance explained by each component when the covariance matrix is:

where - \frac{1}{\sqrt{2}} < \rho < \frac{1}{\sqrt{2}}.

Answer

The characteristic equation of Σ is:

Solutions to the quadratic equation in λ are \lambda_1 = {\sigma}^2 (1+\rho \sqrt{2}) and \lambda_3 = {\sigma}^2 ( 1 - \rho \sqrt{2}). As a consequence, Σ yields three distinct eigenvalues ​​λ1, λ2 and λ3, where \lambda_1 = {\sigma}^2 (1 + \rho \sqrt{2}), \lambda_2 = {\sigma}^2, and \lambda_3 = {\sigma}^2 (1 - \rho \sqrt{2}).

Moreover, it can be proved that λ1, λ2, and λ3 result in the eigenvectors \vec{e}_1 = \begin{pmatrix}1/2 \\ 1/ \sqrt{2} \\ 1/2 \end{pmatrix}, \vec{e}_2 = \begin{pmatrix}-1/\sqrt{2} \\ 0 \\ 1/\sqrt{2} \end{pmatrix}, and \vec{e}_3 = \begin{pmatrix}1/2 \\ -1/ \sqrt{2} \\ 1/2 \end{pmatrix}, respectively.

The principal components obtained from Σ are:

The proportions of the total population variance explained by the components are:

Leave a Reply

Your email address will not be published. Required fields are marked *