Some basics of matrix algebra

Notations

Convention (unless otherwise noted):

All vector are taken as column vectors by default, for example, if \(x \in \mathbb{R}^n\), then we can write \[ x= \begin{bmatrix} x_1 \\ \vdots \\ x_n \end{bmatrix}.\]

Generic capital letter will often denote a matrix, e.g. \({A}\), a \(m \times n\) matrix,

\[\begin{align*} {A} = \left[\begin{array}{c} {a}_{1},{a}_{2} \cdots {a}_{n} \end{array}\right], \end{align*}\] with each \(a_i\) belonging to \(\mathbb{R}^m\).

Suppose \({e}_{i}, i=1, \ldots, n\) is \(n \times 1\) unit vector, with 1 in the ith position and zeros elsewhere, i.e., the identity \(n \times n\) matrix can be written as \[\begin{align*} {I}_n = \left[\begin{array}{c} {e}_{1},{e}_{2} \cdots {e}_{n} \end{array}\right]. \end{align*}\]

For \(A\) an \(m \times n\) matrix, Then the ith column of \(A\) can be expresed as \(A \mathbf{e}_{i},\) for \(i=1, \ldots, n\).

Elementary definitions and results

Matrix-vector multiplication: \[ Ax=\begin{bmatrix} {a}_{1},{a}_{2} \cdots {a}_{n} \end{bmatrix} \begin{bmatrix} x_1 \\ \vdots \\ x_n \end{bmatrix}=b. \] \(Ax\) is a linear combination of columns vectors of \(A\). The coefficients of the linear combinations are stored in \(x\).

Matrix-matrix multiplication: Let \[\begin{align*} {A} = \left[\begin{array}{c} {a}_{1},{a}_{2} \cdots {a}_{n} \end{array}\right], \quad \ell \times m \end{align*}\] and \[\begin{align*} {C} = \left[\begin{array}{c} {c}_{1},{c}_{2} \cdots {c}_{n} \end{array}\right], \quad m \times n \end{align*}\] then \[ AC=B=[b_1, \cdots, b_n] : \quad \ell \times n \] where \(b_j=Ac_j\), \(j=1, \ldots, n.\)

Range space (column space): The column space \(\rm{col}(A)\) of \(A\) is the span of vectors \(a_1, \ldots, a_n\), i.e., \({\rm span}\{a_1, \ldots, a_n\}\) the set of all linear combinations of \(a_1, \ldots, a_n\): \[\begin{align*} \operatorname{Col} (A)=\left\{b: b=A x \text { for some } x \in \mathbb{R}^{n}\right\} \end{align*}\]

The vector \(b \in \mathbb{R}^m\) belongs to \(\rm{Col}(A)\) iff \(\exists\) scalars \(x_1, \ldots, x_n\) such that \(b=x_1a_1+\cdots+x_na_n\).

Kernel space (null space): The kernel space of an \(m \times n\) matrix \(A\), written as \(\rm{ker}(A)\), is the set of all solutions of the homogeneous equation \(Ax=0\). \[\begin{align*} \operatorname{ker}(A)=\left\{x: x \in \mathbb{R}^{n} \text { and } A x=0\right\}. \end{align*}\]

Linear independence of a set of vectors: suppose we have \(p\) vectors in \(\mathbb{R}^n\), say, \(v_1,\ldots, v_p\)

we call this collection \(V=\{v_1,\ldots, v_p\}\) to be linearly independent if whenever \(x_1v_1+\cdots+x_pv_p=0\), we have \(x_1=\cdots=x_p=0\).
\(V=\{v_1,\ldots, v_p\}\) is said to be linearly dependent if \(\exists\) some vector in \(V\) that is a linear combination of the other vectors in \(V\).

Rank of a matrix: The row rank of a matrix is the maximum number of rows, thought of as vectors, which are linearly independent. Similarly, the column rank is the maximum number of columns which are linearly independent. A basic fact is that the row and column ranks of a matrix are equal to each other. Thus one simply speaks of the rank of a matrix.

For a \(m\times n\) matrix \(A\), we call it full row rank if it rows are linearly independent; full column rank if its columns are linearly independent.

Theorem (rank-nullity theorem): Let \(A\) be an \(m \times n\) matrix, Then \[\begin{align*} n=\operatorname{dim}(Col(A))+\operatorname{dim}(\operatorname{ker}(A)) = \operatorname{rank}(A) + \operatorname{dim}(\operatorname{ker}(A)). \end{align*}\] where the dimension \(\operatorname{dim}\) is the maximal number of linearly independent elements that span the space.

Determinant of a matrix: Determinant is a number associated with any square matrix; we’ll write it as \(det(A)\) or \(|A|\). There are several equivalent ways to define determinant. The determinant encodes a lot of information about the matrix. A square matrix is invertible exactly when the determinant is non-zero. Example:

\[\begin{align*} \left|\left[\begin{array}{ll} a & b \\ c & d \end{array}\right]\right|=ad-bc. \end{align*}\]

The determinant of a square matrix is equal to the product of its eigenvalues (see later for definition).

Inverse of a matrix: Suppose \(A\) is an \(m \times m\) square matrix, the \(m \times m\) matrix \(Z\) is said to be the inverse of \(A\) iff \(AZ=I=ZA\). We then call \(A\) is invertible, denote its inverse by \(A^{-1}\). A square matrix that is not invertible is called singular. When \(A\) is invertible, then its inverse is given by \[ A^{-1}=\frac{1}{|A|}adj(A), \] where \(adj(A)\) is the adjoint matrix of \(A\).

A square matrix is invertible if and only if its determinant is not zero (i.e., its eigenvalues are all non-zero).

Example:

\[\begin{align*} \left[\begin{array}{ll} a & b \\ c & d \end{array}\right]^{-1}=\frac{1}{a d-b c}\left[\begin{array}{cc} d & -b \\ -c & a \end{array}\right], \end{align*}\] if \(ad-bc \neq 0.\)

Theorems: A square matrix has a unique inverse, \(A^{-1}\) iff the column vector are linearly independent, i.e., no column vector of \(A\) is a linear combination of the others.

If \(A\) is invertible, then for any \(b\) vector, \(Ax=b\) has a unique solution \(x\) which is \(x=A^{-1}b\), the coefficients needed to represent \(b\) as a linear combination of columns of \(A\).

Transpose of a matrix: The transpose of a matrix \(A=[a_{i,j}]\) is given by \(A^\top=[a_{j,i}].\)

Inner prodcut: Suppose \(x, y \in \mathbb{R}^n\), Euclidean inner product is \(\langle x, y\rangle= x^\top y=\sum_{i=1}^n x_iy_i.\) The Euclidean norm of \(x\) is \(\|x\|:=\sqrt{\langle x, x\rangle}=\sqrt{\sum_{i=1}^nx_i^2}.\) The angel \(\alpha\) (\(0 \leq \alpha \leq \pi\)) between the two vectors \(x, y\) is defined by \(\cos(\alpha)=\frac{\langle x, y\rangle}{\|x \|\|y\|}.\)

Trace of a matrix: The trace of a square matrix \(A\) is the sum of all the diagonal entries of \(A\). Note: for compatiblae matrices, trace has cyclic property, namely, \[\rm{tr}(ABCD)=\rm{tr}(DABC)=\rm{tr}(CDAB).\]

Orthogonal vectors: Two vectors \(v, w\) are orthogonal to each other if \(\langle v, w\rangle=0\).

orthogonal complement: The orthogonal complement of a subspace \(W\) in a vector space \(V\) is the set of all vectors in \(V\) that are orthogonal to every vector in \(W\). It’s denoted as \(W^{\perp}\).

Column space and Kernel space: \(\operatorname{Col}(A)^{\perp}=\operatorname{Ker}\left(A^\top \right)\) and \(\operatorname{Row}(A)^{\perp}=\operatorname{Ker}(A).\)

Theorem: \(S=\{v_1, \ldots, v_p \}\) be a set of nonzero orthogonal vectors in \(\mathbb{R}^n\), then \(S\) is linearly independent.

Orthogonal/orthonormal matrix: An \(m \times n\) matrix \(U\) has orthonormal columns if \(U^\top U=I\).

An \(m \times m\) square matrix \(P\) is called an orthogonal matrix if \[\begin{align*} P P^{\top}=P^{\top} P=I_{m}, \quad \text{or} \quad P^{-1}=P^{\top} . \end{align*}\]

Any square matrix with orthonormal columns is an orthogonal matrix, and such a matrix must have orthonormal rows too.

Fact: Multiplication by orthogonal matrices preserves inner product, hence length and angles. For this reason, orthogonal matrices are often called rotation matrices.

Positive definite matrix: A square and symmetric matrix \(A\) is called a positive definite matrix (denoted by \(A>0\)), if \(x^\top Ax>0\) for all \(x\neq 0\). A key characterization of a positive definite matrix is that the eigenvalues must be strictly positive. Similarly, one can define negative definite matrix. Both positive definite matrices and negative definite matrices are invertible. But the converse is not true.

Positive semi-definite matrix: A square and symmetric matrix \(A\) is called a positive semi-definite matrix (denoted by \(A\geq 0\)), if \(x^\top Ax\geq 0\) for all \(x\neq 0\). A positive semi-definite matrix have non-negative eigenvalues (some possibly are zero). For a positive semi-definite matrix, it is invertible if and only if it is also positive definite. Similarly, one can define negative semi-definite matrix.

Indefinite matrix: A square and symmetric matrix \(A\) that is neither positive semi-definite nor negative semi-definite is called indefinite.

Projection of a vector on the span of another vector: the projection of a vector \(y \in \mathbb{R}^n\) onto the span of another vector \(x \in \mathbb{R}^n\) (\(\|x\|\neq 0\)), i.e., \({\rm span}(x):=\{ax:a \in\mathbb{R}\}\) is given by \[ P_xy=\frac{\left\langle x,y\right\rangle}{\left\langle x,x\right\rangle}x=\frac{xx^\top}{\left\langle x,x\right\rangle}y, \] which has the property that \(\left\langle y-P_xy,P_xy\right\rangle=0.\)

Back Updated: 2024-08-29 Statistical Simulation, Wei Li

Statistical Computation and Simulation

Wei Li

2024-08-29

Some basics of matrix algebra

Notations

Elementary definitions and results