Back Updated: 2024-12-03 Statistical Simulation, Wei Li
Target density: \(f(x,y)\), which could be unnormalized.
We use two candidate transition densities \(q_{X} (x \vert x_c, y_c)\) and \(q_{Y} (y \vert x_c, y_c)\), where \(x_c\), \(y_c\) stand for current values.
Algorithm:
Remark: The detailed balance condition holds within each update component; however it does not hold for the composition, that is, \(\displaystyle p \big( (x,y) \to (x^*,y^*) \big) f(x,y) = p \big( (x^*,y^*) \to (x,y) \big) f(x^*,y^*)\) does not hold for this algorithm in general.
Proof: provided in class.
Recall the block-wise Metropolis-Hastings: to sample from \(f(X,Y)\), we used two candidates: \(q_X(x|X,Y), q_Y(y|X,Y)\).
Suppose we take \(q_X(x|X^{(s)}, Y^{(s)})\) to be full conditional distribution \(f_{X|Y}(x|Y^{(s)})\), then
\[\begin{align*} \alpha_X(X^*|X^{(s)},Y^{(s)})&=\frac{f(X^*,Y^{(s)})}{f(X^{(s)},Y^{(s)})}\cdot\frac{q_X(X^{(s)}|X^{*},Y^{(s)})}{q_X(X^*|X^{(s)},Y^{(s)})}\\ &=\frac{f_{X|Y}(X^*|Y^{(s)})}{f_{X|Y}(X^{(s)}|Y^{(s)})}\cdot\frac{f_{X|Y}(X^{(s)}|Y^{(s)})}{f_{X|Y}(X^*|Y^{(s)})}\\ &=1. \end{align*}\]
Definition: Gibbs sampling is a special case of (blockwise) MH that take \(q_X(x|X_c, Y_c)\) to be \(f_{X|Y}(x|Y_c)\), and take \(q_Y(y|X_c,Y_c)\) to be \(f_{Y|X}(y|X_c)\), which then yield \(\alpha_X=\alpha_Y=1\). So following this sampling scheme, all proposals are automatically accepted.
Algorithm: Two stages Gibbs sampling
\(f(X,Y)\), possibly unnormalized
Take some \(x^{(0)}\). For \(s=0,1,\cdots\), generate
Algorithm: Multi-stage Gibbs sampling
\(f(X_1,X_2,\cdots,X_d)\), possibly unnormalized
Starting Values: \(X^{(0)}=(X^{(0)}_1,X^{(0)}_2,\cdots,X^{(0)}_d)\). Let \(f_{j}(X_j|X_{-j})\) denote the conditional density of \(X_j\) given all the rest components \(X_{-j}:=\{X_i: i\neq j\}\).
The algorithm generates \(X^{(s+1)}\) from \(X^{(s)}\) as follows:
\[\begin{align*} (1)&\quad X^{(s+1)}_1 \sim f_1(\cdot | X^{(s)}_2, \cdots,X^{(s)}_d)\\ (2)&\quad X^{(s+1)}_2 \sim f_2(\cdot | X^{(s+1)}_1, X^{(s)}_3, \cdots,X^{(s)}_d)\\ & \vdots \\ (d)&\quad X^{(s+1)}_d \sim f_d(\cdot | X^{(s+1)}_1, \cdots,X^{(s+1)}_{d-1}) \end{align*}\]
Repeat \(s \leftarrow s+1\).
\(\textbf{Advantage:}\) For high-dim problem, all the situation can be univariate and all probabilities are accepted.
Remark:
\[f(x)\propto\frac{e^{-x^2/20}}{(1+(z_1-x)^2)(1+(z_2-x)^2)},\quad z_1=-4.3, z_2=5.2\]
Note that: \(\displaystyle \frac{1}{1+(z_i-x)^2}=\int_0^\infty e^{-w_i(1+(z_i-x)^2)}\,dw_i\), then we can write \[f(x,w_1,w_2)\propto e^{-x^2/20}\prod_{i=1}^2 e^{-w_i(1+(z_i-x)^2)}\] so \(f(x)\) is just the marginal pdf of \(f(x,w_1,w_2)\).
Gibbs sampling: \({w}=(w_1, w_2)\)
Here \[f_{X|W}(x|w_1,w_2) \overset{x} \propto e^{-\big(\sum w_i\big)x^2+2x\sum w_iz_i} e^{-x^2/20}\sim N \Big(\frac{\sum w_iz_i}{\sum w_i+1/20},\frac{1}{2\big(\sum w_i+1/20\big)} \Big),\] \[f_{W|X}(w_1,w_2|x) \overset{w} \propto e^{-w_1(1+(z_1-x)^2)} e^{-w_2(1+(z_2-x)^2)}.\] Note from the factorization above, \(w_1,w_2\) are independent given \(x\), therefore, \[\begin{align*} W_1|X &\sim \mbox{exponential}(1+(z_1-X)^2),\\ W_2|X & \sim \mbox{exponential}(1+(z_2-X)^2). \end{align*}\]
The multivariate normal distribution of a \(d-\)dimensional random vector \(\mathbf{X}=(X_1, \cdots, X_d)^{\top}\in\mathbb{R}^d\) can be written in the notation \(\mathbf{X} \sim \mathcal{N}(\boldsymbol\mu, \mathbf{\Sigma}).\) When the symmetric covariance matrix \(\mathbf{\Sigma}\) is positive definite, then the multivariate normal distribution is non-degenerate, and the distribution has density function \[\begin{align*} f_{\mathbf{X}}(\mathbf{x} \vert \boldsymbol\mu, \mathbf{\Sigma}) &= \frac{1}{(2 \pi)^{\frac{d}{2}}} \big( \det(\mathbf{\Sigma}) \big)^{-\frac{1}{2}} \exp \big(-\frac{1}{2} (\mathbf{x}-\boldsymbol\mu)^{\top} \mathbf{\Sigma}^{-1} (\mathbf{x}-\boldsymbol\mu) \big)\\ &\overset{\mathbf{x}} \propto \exp(-\frac{1}{2} \mathbf{x}^{\top} \mathbf{\Sigma}^{-1} \mathbf{x} + \mathbf{x}^{\top} \mathbf{\Sigma}^{-1} \boldsymbol\mu) \end{align*}\] where \(\mathbf{x}=(x_1, \cdots, x_d)^{\top}\).
For comparison, the density function of (univariate) normal distribution \(X\sim N(\mu, \sigma^2)\), \[ f_X(x \vert \mu, \sigma^2)=\frac{1}{\sqrt{2\pi}\sigma} \exp \big(-\frac{1}{2\sigma^2}(x-\mu)^2 \big) \overset{x} \propto \exp\big(-\frac{1}{2\sigma^2}(x^2-2x\mu+\mu^2)\big) \overset{x} \propto \exp\big(-\frac{1}{2} x^2 (\sigma^2)^{-1}+ x(\sigma^2)^{-1}\mu\big). \]
Bivariate normal \[\displaystyle (X,Y)\sim N \left(\begin{pmatrix}\mu_X\\ \mu_Y\end{pmatrix},\begin{pmatrix}\sigma_X^2 & \sigma_{XY}\\ \sigma_{XY}&\sigma_Y^2\end{pmatrix} \right),\] Let \(\rho=\frac{\sigma_{XY}}{\sigma_{X}\sigma_Y}\) be the correlation. It is a fact that \[f_{X|Y}(x|y)\sim N(\mu_X+\rho\frac{\sigma_X}{\sigma_Y}(y-\mu_Y), \sigma_X^2(1-\rho^2)),\] \[f_{Y|X}(y|x)\sim N(\mu_Y+\rho\frac{\sigma_Y}{\sigma_X}(x-\mu_X), \sigma_Y^2(1-\rho^2)).\]
The Ising model is a mathematical model of ferromagnetism in statistical mechanics. It comprises a lattice of spins, where each spin can be either up (+1) or down (-1). These spins represent magnetic dipole moments of atomic “spins”. The key interest in the Ising model is understanding how these spins interact under varying temperatures and how these interactions lead to macroscopic phenomena like phase transitions.
In image processing, a 2D lattice can represent an image where each lattice site corresponds to a pixel: \[\begin{align*} I=\{1,2, \ldots, L\} \times\{1,2, \ldots, L\} \end{align*}\] The value at each site (spin) can denote various image properties, like color or intensity. Here we consider binary images, where the state space is given by \(\mathcal{X}=\{-1, 1\}^{I}\).
The Ising model describes the energy of a system using an energy function:
\[\begin{equation*} H(x) = -J \sum_{i \sim j\\ i, j \in I} x_i x_j - h\sum_{i} x_i, \quad x \in \mathcal{X}, \end{equation*}\]
where \(i \sim j\) indicates that \(x_i\) and \(x_j\) are the spins of adjacent lattice sites, for simplicity, we consider only nearest neighbors, so for a pixel \(i=\left(i_1, i_2\right)\) in the interior of the grid there are four neighbours \(\left(i_1+1, i_2\right),\left(i_1, i_2+1\right),\left(i_1-1, i_2\right)\) and \(\left(i_1, i_2-1\right)\). For a pixel on the edges of the grid (but not corners) three neighbours, for a corner pixel two neighbours.
The probability measure for the system on \(\mathcal{X}\) can be written as \[\begin{align*} \pi(x)=\frac{1}{Z} \exp (-H(x)), \end{align*}\] where \(Z\) is the normalizing constant.
It can be shown that \[\begin{align*} P\left(X_i=+1 \mid X_{-i}=\boldsymbol{x}_{-i}\right) = \frac{1}{1+\exp \left(-2J \eta_i\right)}, \end{align*}\] where \(\eta_i := \sum_{j \sim i} x_j\).
Bayesian binary denoising
Assume that a collection of original binary images has a distribution that is described by the Ising model \(\pi(x)\) above. We further assume that the observed images are noisy versions of the original images.
To be concrete, we assume for each pixel \(X_i\), we actually observe \(Y_i=X_i+\epsilon_i\) where \(\epsilon_i\) are indendently distributed \(N(0, \sigma^2).\)
Using Bayes’ rule, \[\begin{align*} p_{X \mid Y}(x \mid y)=\frac{p_{Y \mid X}(y \mid x) p_X(x)}{p_Y(y)}=\frac{1}{Z_y} \exp \left(-H(y, x)\right), \end{align*}\] where \[\begin{align*} H(y, x)=-\frac{1}{\sigma^2} \sum_{i \in I} y_i x_i+H(x)=-\frac{1}{\sigma^2} \sum_{i \in I} y_i x_i+J \sum_{\substack{i \sim j}} x_i x_j. \end{align*}\] The full conditional distribution of \(X_i\) can be similarly derived \[\begin{align*} P\left(X_i=+1 \mid X_{-i}=\boldsymbol{x}_{-i}, Y=\boldsymbol{y} \right) = \frac{1}{1+\exp \left(-2(J \eta_i+ y_i/\sigma^2)\right)}, \end{align*}\] where \(\eta_i := \sum_{j \sim i} x_j\).
The slice sampling is a technique based on the idea of the Fundamental Theorem of Simulation and Gibbs sampling.
Recall from the Fundamental Theorem of Simulation, \[ f(x)=\int_{0}^{f(x)}1du. \] So \(f(x)\) is a marginal density (in \(X\)) of the joint uniform distribution on the set \[ \{(x, u): 0<u<f(x), x\in\mathcal{X}\}, \] with the p.d.f. \(f(x, u)=1_{\{(x, u): 0<u<f(x)\}}(x, u).\)
To obtain a sample of \((X ,U)\) from this distribution, we can sample from a M.C. that has the target stationary distribution which can be done using Gibbs sampling, noting that \[\begin{align*} &f_{X|U}(x|u) \sim \operatorname{unif}\{x: u< f(x) \}, \\ &f_{U|X}(u|x) \sim \operatorname{unif}\{u: 0< u< f(x) \}. \end{align*}\]
Slice sampling:
At iteration \(s\), simulate
Remark 1: A major difficulty in the algorithm is to evaluate the set \(A^{(s+1)}\). Consider for a univaraite distribution, we can apply an iterative expansion–called stepping out: star with some small constants \(L>0, R>0\) and consider the candidate interval \(x \in (X^{(s)}-L, X^{(s)}+R)\), then we extend the interval until the endpoints fall outside the slice (i.e., either \(f(X^{(s)}-L) \leq U^{(s+1)}\) or \(f(X^{(s)}+R)\leq U^{(s+1)})\)) or after some fixed number of steps. Eventually we aim for the interval \[ \tilde{A}^{(s+1)}=\{x\in (X^{(s)}-L, X^{(s)}+R): f(x)>U^{(s+1)} \}. \] Then in a loop we draw \(X^*\) from \(\tilde{A}^{(s+1)}\) using accept/reject method, i.e., we draw \(X^*\) from the interval until \(f(X^*)>U^{(s+1)}\) and then set \(X^{(s+1)} \leftarrow X^*\). During the loop, if \(f(X^*)\leq U^{(s+1)}\), we adjust the interval by shrinking it–specifically, if \(X^*>X^{(s)}\), set \(R \leftarrow X^*-X^{(s)}\); if \(X^*<X^{(s)}\), set \(L \leftarrow X^{(s)}-X^*\), then redraw from the interval.
Remark 2: The procedure described in Remark 1 can be generalized to multivariate distribution.
Remark 3: The algorithm remains valid if we use \(\tilde{f}\) the kernel of the p.d.f. in the algorithm.
General form:
Suppose \(f(x) \propto \prod_{i=1}^k f_i(x)\), where \(f_i(x)>0\) not necessarily density. Note that \(f_i(x) = \int 1\{x: 0 \leq w_i \leq f_i(x) \}dw_i\). Thus \(f\) is the marginal distribution of the joint distribution \[ (x, w_1, \ldots, w_k) \overset{x, w_1, \ldots, w_k}{\propto} \prod_{i=1}^k 1\{w_i : 0\leq w_i \leq f_i(x) \}, \] noting \[ f(x) \overset{x}{\propto} \int \cdots \int \prod_{i=1}^k 1\{w_i: 0 \leq w_i \leq f_i(x)\}dw_{1}\cdots dw_{k}= \prod_{i}f_i(x). \]
Slice sampling:
At iteration \(s\), simulate
\(W_1^{(s+1)} \sim \operatorname{unif}(0, f_1(X^{(s)}))\),
\(W_2^{(s+1)} \sim \operatorname{unif}(0, f_2(X^{(s)}))\),
\(\vdots\)
\(W_{k}^{(s+1)} \sim \operatorname{unif}(0, f_k(X^{(s)}))\),
\(X^{(s+1)} \sim \operatorname{unif}(A^{(s+1)})\), where \(A^{(s+1)}:=\{x: f_i(x) \geq W_i^{(s+1)}, \forall i=1, \ldots, k\}.\)
Provided and discussed in lab.
Back Updated: 2024-12-03 Statistical Simulation, Wei Li