Back Updated: 2024-09-20 Statistical Simulation, Wei Li
Gradient. For \(f:{\mathbb{R}^n}\to{\mathbb{R}}\) a smooth function, its gradient is the \(n{\times}1\) vector \[ {\nabla}f(x)=\begin{bmatrix} \frac{\partial f(x)}{\partial x_1} \\ \frac{\partial f(x)}{\partial x_2} \\ \vdots \\ \frac{\partial f(x)}{\partial x_n} \\ \end{bmatrix}. \]
Hessian matrix. The Hessian matrix of \(f\) is a \(n{\times}n\) symmetric matrix defined as \[ {\nabla}^2f(x)=\begin{bmatrix} {\frac{\partial^2 f(x)}{\partial x_1 \partial x_1}}& {\frac{\partial^2 f(x)}{\partial x_2 \partial x_1}}& \cdots & {\frac{\partial^2 f(x)}{\partial x_n \partial x_1}}\\ \vdots & \vdots & \ddots & \vdots \\ {\frac{\partial^2 f(x)}{\partial x_1 \partial x_n}}& {\frac{\partial^2 f(x)}{\partial x_2 \partial x_n}}& \cdots & {\frac{\partial^2 f(x)}{\partial x_n \partial x_n}}\\ \end{bmatrix}. \]
Some results from calculus:
Example: For the linear least square problem \(\min\limits_{x}{f(x)}=\min\limits_{x}{\lVert Ax-b\rVert^2}\),
\[ \begin{aligned} f(x)&=\lVert Ax-b\rVert^2\\&=\langle Ax-b, Ax-b\rangle\\&=(Ax-b)^\top(Ax-b) \\&=x^\top A^\top Ax-2b^\top Ax+b^\top b. \end{aligned} \]
The gradient of \(f(x)\) is \[ {\nabla}f(x)=\frac{{\partial}f(x)}{{\partial}x}=2A^\top Ax-2b^\top A. \]
reference: Numerical Optimization (Nocedal and Wright)
Given \(f:{\mathbb{R^n}}\to{\mathbb{R}}\), \(S{\subseteq}{\mathbb{R^n}}\) is feasible convex set.
1. Convex set: \(S\) is a convex set if and only if \(x,y{\in}S\) entails \({\alpha}x+(1-{\alpha})y{\in}S\) for all \({\alpha}{\in}[0,1]\)
2. Global minimizer: A point \(x^*{\in}S\) is a global minimizer of \(f\) in \(S\) if and only if \(f(x^*){\le}f(x)\) for all \(x{\in}S\).
3. Local minimizer A point \(x^*{\in}S\) is a local minimizer of \(f\) in \(S\) if and only if there exists a neighborhood of \(x^*\), \(N(x^*)\) such that \(f(x^*){\le}f(x)\), for all \(x{\in}N(x^*){\cap}S.\)
4. Strict local minimizer: A point \(x^*{\in}S\) is a local minimizer of \(f\) in \(S\) if and only if there exists a neighborhood of \(x^*\), \(N(x^*)\) such that \(f(x^*)<f(x)\), for all \(x{\in}N(x^*){\cap}S.\)
5. Convex functions:
6. stationary point and saddle point
A stationary point \(v\) of \(f\) is any point that satisfies the FOC \({\nabla}f(v){=}0\). In general, a stationary point could be minimizer, maximizer or saddle point.
A saddle point is a stationary point at which the curvature of the function changes from negative to positive or vice versa (i.e., also a point of inflection). More precise definition is given below.
7. Consequence of convexity
When \(f\) is convex on \(S\), then any local minimizer is also a global minimizer. If, in addition, \(f\) is differentiable, then any stationary point is a global minimizer.
Let’s consider the minimization problem \(\min _{x \in \mathbb{R}^n} f(x)\) for some smooth \(f: \mathbb{R}^n \mapsto \mathbb{R}.\)
First order result: suppose \(f\) is continuously differentiable. If \(x^*\) is a minimizer of \(f\) then \(\nabla f\left(x^*\right)=0\).
Second order result: suppose \(f\) is twice continuously differentiable. If \(x^*\) is a minimizer of \(f\), then \(\nabla f\left(x^*\right)=0\) and \(\nabla^2 f\left(x^*\right) \geq 0.\)
Second order result: suppose \(f\) is twice continuously differentiable, for some \(x^* \in \mathbb{R}^n\), \(\nabla^2 f\left(x^*\right) > 0\) and \(\nabla f\left(x^*\right)=0\). Then \(x^*\) is a strict local minimizer.
Hessian matrix contains important geometric information about the surface of the function \(f\) (in particular the concavity of the surface).
Recall that the Taylor approximation to \(z=f(x, y)\) at \(\left(x_{0}, y_{0}\right)\) is \[\begin{align*} f(x, y) \approx f\left(x_{0}, y_{0}\right)+{\nabla} f\left(x_{0}, y_{0}\right) \left[\begin{array}{c} x-x_{0} \\ y-y_{0} \end{array}\right]+\frac{1}{2}\left[\begin{array}{ll} x-x_{0} & y-y_{0} \end{array}\right] \nabla^2f\left(x_{0}, y_{0}\right)\left[\begin{array}{l} x-x_{0} \\ y-y_{0} \end{array}\right] \end{align*}\] for \((x, y)\) close to \(\left(x_{0}, y_{0}\right) .\)
More generally, \(f: \mathbb{R^2} \rightarrow \mathbb{R}\) has continuous second partial derivatives such that \(\nabla f(x^*) = 0\), Hessian matrix at \(x^*\) provides the following information:
Note \(f\) has four second partial derivatives-four measures of concavity (in \(\nabla^2 f\)). In fact, eigenvalues of the Hessian \(\nabla^2 f\) measures concavity in the corresponding eigendirections (take for example, \(v^\top Hv=\lambda v^\top v=\lambda\) for an eigenvector \(v\)).
In view of the classical results:
Theorem: Suppose the function \(f(x, y)\) has continuous second partial derivatives. Let \(\left(x_{0}, y_{0}\right)\) be a stationary point of \(f\), and let \(\lambda_{1}\) and \(\lambda_{2}\) be the eigenvalues of \(\nabla^2{f}\left(x_{0}, y_{0}\right)\).
For a function \(f\) of three or more variables, the results can be generalized and the following test can be applied at any stationary point \(x^*\):
Note: some authors would require eigenvalues to be non-zero in the definition of the saddle point.
Back Updated: 2024-09-20 Statistical Simulation, Wei Li