Back Updated: 2024-09-06 Statistical Simulation, Wei Li

Eigenvalue decomposition

Given a square matrix \(A\) so that \(Ax=\lambda x\) for some nonzero vector \(x\) and some \(\lambda\)(scalar). We call \(\lambda\) an eigenvalue of \(A\) whose associated eigenvector is \(x\) (it is possible to have multiple identical eigenvalues. i.e., multiplicity greater than 1). When an eigenvalue is repeated (has a multiplicity greater than 1), there can be multiple linearly independent eigenvectors associated with that eigenvalue, or there might be fewer than expected. The number of linearly independent eigenvectors corresponding to a repeated eigenvalue is called its geometric multiplicity.

If the geometric multiplicity is less than the algebraic multiplicity (the number of times the eigenvalue is repeated), then the matrix is not diagonalizable.

A square matrix \(A\) is called diagonalizable or non-defective if there exists an invertible matrix \(P\) and a diagonal matrix \(\Lambda\) such that \(A=P \Lambda P^{-1}\) (such \(P, \Lambda\) are not unique). Real symmetric matrices are diagonalizable.

If the geometric multiplicity is equal to the algebraic multiplicity, then there are that as many linearly independent eigenvectors associated with the repeated eigenvalue.

Fact: \(A: (n \times n )\) is a square matrix, \(\det(A)=\prod_{i=1}^n \lambda_i\). A square matrix is invertible if and only if its eigenvalues are non-zero.

Eigen-decomposition: \(A\) is a square and symmetric matrix, then we can write \(A=P \Lambda P^\top\) where \[\begin{align*} &P=[u_1, u_2,\cdots, u_n], \quad u\mbox{'s are the orthonormal eigenvector of }A \\ &\Lambda = diag\{\text{eigenvalues of }A\}. \end{align*}\]

Note: If this symmetric matrix \(A\) is invertible, then we have \(A^{-1}=P\Lambda^{-1}P^\top\) where \[\begin{equation*} \Lambda^{-1} = \begin{pmatrix} \ddots & 0 & 0 \\ 0 & \frac{1}{\lambda_i} & 0 \\ 0 & 0 & \ddots \end{pmatrix}. \end{equation*}\]

Example: Consider \(A=\begin{pmatrix} 9 & 0 \\ 0 & 4 \end{pmatrix}\), find its eigenvalues and eigenvectors.

Singular value decomposition

For a \(m \times n\) matrix \({A}\), \(A^{\top}A\) is a symmetric matrix. Let \(\{ v_{1},\ldots,v_{n} \}\) be the collection of all the orthonormal eigenvectors of \(A^{\top}A\) (eigen-decomposition); let \(\lambda_{1},...,\lambda_{n}\) be the associated eigenvalues, \(\lambda_{1} \ge \lambda_{2} \ge \cdots \geq \lambda_{n} \ge 0\). We have \(\lVert Av_{i}\lVert^{2} = \lambda_{i}\), since \[ A^{\top}Av_{i} = \lambda_{i}v_{i} \Rightarrow v_{i}^{\top}A^{\top}Av_{i} = \lambda_{i}. \] The singular values of \(A\) are squared roots of the eigenvalues of \(A^{\top}A\), denoted by \(\sigma_{i} = \sqrt{\lambda_{i}}\); \(\sigma_{1} \ge \sigma_{2} \ge \cdots \geq \sigma_{n} \ge 0\)

Fact: The rank of a matrix \(A\) is equal to the number of positive singular value of \(A\).

SVD: Let \(A\) (\(m \times n\)) be a matrix of rank \(r\) (\(r\leq \min(m, n)\)). There exists a matrix \(\Sigma_{m \times n}\) of the following form, with \(D\) being a diagonal matrix whose entries are the first \(r\) (non-zero) singular value. That is
\[ \Sigma_{m \times n} = \begin{bmatrix} D & 0 \\ 0 & 0 \end{bmatrix}, \] \[ D = \begin{bmatrix} \sigma_{1} \\ & \ddots & \\ & & \sigma_{r} \end{bmatrix}. \] The Singular Value Decomposition of A is: \[ A = U_{m \times m}\Sigma_{m \times n}V^{\top}_{n \times n}, \] where \(U\) and \(V\) are both orthogonal matrices: \(V=[v_1, \cdots, v_n]\) consists of all orthonormal eigenvectors of \(A^{\top}A\); \(U=[u_1, \cdots, u_m]\) consists the column \(u_{i}\) as follows \[\text{for } 1 \le i \le r, \quad u_{i} = \frac{Av_{i}}{\lVert Av_{i}\rVert} = \frac{Av_{i}}{\sigma_{i}}, \] and these columns \(\{ u_{1}, \ldots, u_{r}\}\) can be extended to \(\{ u_{1}, \ldots, u_{m}\}\) as the orthonormal basis.

Matrix \(U\) and \(V\) are not uniquely determined in general, but have the property: \(Col(U)\) spans \(Col(A)\) and \(Col(V)\) spans \(Row(A)\).

Example: Consider \(A=\begin{pmatrix} 3 & 0 \\ 0 & -2 \end{pmatrix}\), find its SVD.

Reduced SVD

For those \(A_{m \times n}\), \(U_{m \times m}\), \(V_{n \times n}\), \(\Sigma_{m \times n}\) above, we have the partition:
\[ U = [U_{r},U_{m-r}], \\ V = [V_{r},V_{n-r}], \\ \Sigma = \begin{bmatrix} D & 0 \\ 0 & 0 \end{bmatrix}. \] Consequently, the SVD of \(A\) can be reduced to
\[ A = \begin{bmatrix} U_{r} & U_{m-r} \end{bmatrix} \begin{bmatrix} D & 0 \\ 0 & 0 \end{bmatrix} \begin{bmatrix} V_{r}^{\top} \\ V_{n-r}^{\top} \end{bmatrix} = U_{r}DV_{r}^{\top}=\sum_{i=1}^r d_i{u}_iv_i^\top. \] Note 1: If we instead consider \(\tilde{A}=\sum_{i=1}^l d_i{u}_iv_i^\top\) for some \(l < r\), then this is the low-rank approximation to \(A\), sometimes is called truncated SVD.

Note 2: If \(m \geq n\), the SVD (sometimes called thin SVD) can be also written as \[\begin{align*} A=\tilde{U}\tilde{D}\tilde{V}^\top \end{align*}\] where \(\tilde{U}: m \times n, \tilde{D}: n \times n; \tilde{V}: n \times n.\)

Application: Linear Least Squares

We can use thin SVD to solve the Least Squares Problems as following:

\[\begin{align*} A^{\top}Ax &= A^{\top}b, \\ (U_rDV_r^{\top})^{\top}(U_rDV_r^{\top})x &= (U_rDV_r^{\top})^{\top}b, \\ V_rDIDV_r^\top x &= V_rDU_r^{\top}b, \\ DV_r^{\top}x &= U_r^{\top}b. \end{align*}\]

Denote \(w = V_r^{\top}x\), \(y = U_r^{\top}b\), we have the following algorithm:

Algorithm

  1. Find the SVD of \(A = U_rDV_r^{\top}\);
  2. Compute \(y = U_r^{\top}b\);
  3. Solve the diagonal system \(Dw=y\), giving \(w^{*} = D^{-1}y\);
  4. Solve \(V_r^{\top}x = w^{*} \Rightarrow x^* = V_rw^{*}\).

The solution of the equation is \(x^* = V_rD^{-1}U_r^{\top}b= V\Sigma^{-1}U^{\top}b\). We notice that this SVD method allows \(A\), \(b\) to be arbitrary. Notice that \(V_{r}D^{-1}U_{r}^{\top}\) is like the “inverse” of A. Here we have the concept of generalized inverse.

Moore-Penrose inverse of A: \(A^{\dagger} = V_{r}D^{-1}U_{r}^{\top}.\)

Fact: if \(A\) is square and invertible, then \(A^{\dagger}= A^{-1}.\)

dig deeper…

In LS Problem \[\min_{x} \lVert Ax -b \rVert,\] where \(A\) and \(b\) are not restricted at all, we have:
\[ A^{\top}Ax = A^{\top}b. \] Let \(\mathfrak{L}\) is the set of all the minimizers to the LS problem. We have following facts.

Fact 1 \(x^{*} = A^{\dagger}b \in \mathfrak{L}\); from the normal equation, \(A^{\top}AA^{\dagger} = A^{\top}.\)

Fact 2 \(A\tilde{x_{1}} = A\tilde{x_{2}}\), for any \(\tilde{x_{1}},\tilde{x_{2}} \in \mathfrak{L}\).

Fact 3 For the optimization problem \(\min_{x \in \mathfrak{L}} \lVert x\rVert\), there is a unique solution \(x^{*} = A^{\dagger}b\).

Fact 4 We already have the result that if we choose some \(\lambda>0\), LS problem will has a unique solution to the “modified” normal equation: \[ (A^{\top}A+\lambda I)x=A^{\top}b, \] that is, \(\hat{x} = (A^{\top}A+\lambda I)^{-1}A^{\top}b.\) In fact, let \(\lambda \rightarrow 0\), we have \((A^{\top}A+\lambda I)^{-1}A^{\top} \rightarrow A^{\dagger}.\) So \[ x^*= \lim_{\lambda \to 0} [(A^{\top}A+\lambda I)^{-1}A^{\top}]b= \lim_{\lambda \to 0} [A^\top(AA^\top+\lambda I)^{-1}]b = A^{\dagger} b. \]

Fact 5 The projection of \(b\) on \(Col(A)\) is given by \(A(A^{\top}A)^{-1}A^{\top}b\) (assuming \(A^{\top}A\) is invertible), here is a more general result
\[ \hat{b} = AA^{\dagger}b = \lim_{\lambda \rightarrow 0}[A(A^{\top}A+\lambda I)^{-1}A^{\top}]b, \] so \(P_A:=AA^{\dagger}\) is orthogonal projection onto \(Col(A)\), \(I-P_A\) is orthogonal projection onto \(\operatorname{Ker}(A^\top).\)

Back Updated: 2024-09-06 Statistical Simulation, Wei Li