Statistical Computation and Simulation

Back Updated: 2024-01-21 Statistical Simulation, Wei Li

Metropolis-Hasting algorithm with different move types

We consider MH algorithms with different move types; which gives a more general framework for MH algorithms.

\(X \in \mathbb{R}^{d}.\)
Let \(\mathcal{M}\) denote a finite or countable set of move types, e.g., \(\mathcal{M}=\{1, 2, \ldots, M\}\).
\(\gamma_m(x)\) denote probability of choosing move \(m\) when the current value \(X^{(s)}=x\), \(\sum_{m} \gamma_m(x)=1\).

We first randomly choose a move type \(m\in \mathcal{M}\) and then generate \(X^*\) with a move-dependent transition density \(q_m(\cdot|X^{(s)})\), where \(q_m(\cdot|\cdot): \mathbb{R}^d \times \mathbb{R}^d \to \mathbb{R}\).

MH algorithm with different move types:

Let \(s=0\).

Generate \(m^{(s+1)}\in \mathcal{M}\) with probability \(P(m^{(s+1)}=m)=\gamma_m(X^{(s)})\);
Generate \(X^* \sim q_{m^{(s+1)}}(\cdot|X^{(s)})\);
Let acceptance probablity be \[ \alpha_{m^{(s+1)}}(X^{*}|X^{(s)})=\min \left\{ \frac{f(X^{*})\gamma_{m^{(s+1)}}(X^{*})q_{m^{(s+1)}}(X^{(s)}|X^{*})}{f(X^{(s)})\gamma_{m^{(s+1)}}(X^{(s)})q_{m^{(s+1)}}(X^{*}|X^{(s)})}, 1 \right\}. \]
Generate \(U\sim \mbox{unif}(0,1)\)
1. if \(U\leq \alpha_{m^{(s+1)}}(X^{*}|X^{(s)})\), set \(X^{(s+1)} = X^*\);
2. if \(U> \alpha_{m^{(s+1)}}(X^{*}|X^{(s)})\), set \(X^{(s+1)} = X^{(s)}\).
\(s\leftarrow s+1.\)

Remarks

\(\gamma_m(x)\) can depend on time as well. In the simplest case, for example, \(\mathcal{M}=\{1, 2,\ldots, M\}\), we can choose \(m\) to be \((s \text{ mod } M)+1\), this will cylce through all move types.
Every move type can be shown to satisfy detailed balance condition, but the composition needs not.
Note that \(q_m(\cdot|\cdot): \mathbb{R}^d \times \mathbb{R}^d \to \mathbb{R}\), this transition density may not be well-defined if we just want to update a single/part of the whole vector of \(X \in \mathbb{R}^d\), so \(X^*\) falls on a lower dimensional space of \(X \in \mathbb{R}^d\).

A modification:

To address the above issue (3), we have a more general update rule for the acceptance probability. We need to assume two things for this.

Assume for each move \(m\in \mathcal{M}\), (1) there is a set \(C_m \subset \mathbb{R}^d \times \mathbb{R}^d\) such that \[ (x, x') \in C_m \Longleftrightarrow (x', x) \in C_m , \quad \forall x, x' \] and; (2) there is a density \(\psi_m: C_m \to [0, \infty]\) such that the proposed \(X^*\) transitioning from \(X^{(s)} \sim f\) satisfies \[ P(X^*\in B, X^{(s)}\in A) =\int_{C_m} 1_A(x)1_B(x')\psi_m(x, x')dx'dx \] for all \(A, B\). Then the (modified) acceptance probability is \[ \alpha_m(X^*|X^{(s)})=\min \left\{ \frac{\gamma_m(X^{^*}) \psi_m(X^{*}, X^{(s)})}{\gamma_m(X^{(s)}) \psi_m(X^{(s)}, X^{*})}, 1 \right\}. \]

Remark:

If the transition density \(q_m\) from \(X^{(s)}\) to \(X^*\) exists (or well-defined density), then \(\psi_m(x, x')=f(x)q_m(x'|x)\), giving the previous algorithm. More generally, however, in this modified algorithm, is that we allow \(C_m\) to be a lower dimensional space of \(\mathbb{R}^d \times \mathbb{R}^d\).
This modified algorithm generalizes the blockwise MH algorithm and the Gibbs sampling.

Proof: provided in class.

Reversible Jump Markov Chain Monte Carlo

The RJMCMC algorithm is a generalization of the MH algorithm that allows moving across different dimensions of the parameters (models). The see how this algorithm work, we consider a general form of HM algorithm.

Recall the detailed balance condition \[\begin{align} \pi(x) p\left(x, x^{\prime}\right)=\pi\left(x^{\prime}\right) p\left(x^{\prime}, x\right) \tag{*} \end{align}\] where \(p\) is the transition density moving from \(x\) to \(x'\). An equivalent form is \[\begin{align*} \int_A \int_{A^{\prime}} \pi(x) p\left(x, x^{\prime}\right) d x^{\prime} d x=\int_A \int_{A^{\prime}} \pi\left(x^{\prime}\right) p\left(x^{\prime}, x\right) d x^{\prime} d x, \quad \forall A, A' \end{align*}\] or \[\begin{align} \int_A \pi(x) P\left(x, A^{\prime}\right) d x=\int_{A^{\prime}} \pi\left(x^{\prime}\right) P\left(x^{\prime}, A\right) d x^{\prime}, \quad \forall A, A^{\prime} \tag{**} \end{align}\] where \(P(x, A'):=\int_{A'} p(x, x')dx'\). Recall also the classical HM (effective) transition probability using some tentative transition density \(q(x, x')\), takes the form of \[\begin{align*} P\left(x, A^{\prime}\right)=\int_{A^{\prime}} q\left(x, x^{\prime}\right) \alpha\left(x, x^{\prime}\right) d x^{\prime}+ r(x)\delta_A(x), \end{align*}\] where \(r(x):=1-\int \alpha(x, x') q(x, x')dx'\), and \(\alpha(x, x')\) is some acceptance probability.

We then obtain a useful equation that characterizes the feature of the detailed balance condition using \[\begin{align} \int_A \pi(x) \int_{A^{\prime}} q\left(x, x^{\prime}\right) \alpha\left(x, x^{\prime}\right) d x^{\prime} d x=\int_A \pi\left(x^{\prime}\right) \int_{A^{\prime}} q\left(x^{\prime}, x\right) \alpha\left(x^{\prime}, x\right) d x^{\prime} d x \tag{***} \end{align}\]

Suppose we like to move between two spaces \(\mathcal{X}_1 \subset \mathbb{R}^{d_1}\) and \(\mathcal{X}_2 \subset \mathbb{R}^{d_2}\), \(d_1 \neq d_2\), where the whole space is \(\mathcal{X}=\mathcal{X}_1 \cup \mathcal{X}_2.\) Let \(M=\{1, 2\}\) denote the model type or space type.

We first need to have some probability according to which to move from one space to another space, for example, \(j_{1\to 2} (x)\) is the probability of moving from \(\mathcal{X}_1\) to \(\mathcal{X}_2\) when the current value is \(x\in \mathcal{X}_1.\) More generally, let \(j_{k \to k'}\) denote the probability of moving from space \(k\) to space \(k'\).

After we determine the space type, we’ll have to determine the rule for (tentative) transition density and the acceptance probability. If we remain in the same space, the classical proposal for transition and the acceptance probability would work. The key is when the move has been determined that it moves between spaces \(k\) and \(k'\) where \(\mathcal{X}_k \neq \mathcal{X}_{k'}.\)

So we’ll focus for example, on the transition and acceptance rule moving from \(1\) to \(2\).

generate some \(w\in \mathbb{R}^{n_1}\) according to some density function \(g_{1\to2}(\cdot)\)
use some deterministic bijective mapping: \[ h_{1\to2}: (x, w) \mapsto (x' ,w') \quad \mathbb{R}^{d_1+n_1} \to \mathbb{R}^{d_2+n_2} \] which is dimensions matching (\(d_1+ n_1= d_2+ n_2\)) to obtain \(x' \in \mathcal{X}_2\).
the map \(h_{1\to2}\) is assumed to be a diffeomophism (continuously differentiable and so is its inverse).
when moving backward from \(x'\in \mathcal{X}_2\) to \(x \in \mathcal{X}_1\), we may sample some \(w'\) according to some density \(g_{2\to1}(\cdot)\), and move under the deterministic bijective mapping \(h_{2\to1}:=h_{1\to2}^{-1}\) where \[ h_{2 \to 1}: (x', w') \mapsto (x, w) \quad \mathbb{R}^{d_2+n_2} \to \mathbb{R}^{d_1+n_1} \] That is \[\begin{align*} \begin{array}{ll} h_{1\to2}: (x, w) \longmapsto (x', w') & h_{2\to1}: (x', w') \longmapsto (x, w) \\ h_{1\to2}: \mathbb{R}^{d_1+n_1} \longrightarrow \mathbb{R}^{d_2+n_2} & h_{2\to1}: \mathbb{R}^{d_2+n_2} \longrightarrow \mathbb{R}^{d_1+n_1} \end{array} \end{align*}\]
The acceptance probability is given by \[\begin{align*} \alpha_{1\to 2}\left((x', w') \mid (x, w) \right)=\min \left\{1, \frac{\pi\left(x'\right) g_{2 \rightarrow 1}\left(w'\right)}{\pi\left(x\right) g_{1 \rightarrow 2}\left(w \right)}\left|\frac{\partial {h}_{1 \rightarrow 2}\left(x, w \right)}{\partial\left(x, w\right)}\right|\right\}, \end{align*}\] where the \(|\partial {h}_{1 \rightarrow 2}(x, w)/\partial(x, w)|\) is the absolute value of the determinant of the Jacobian matrix.

Remark:

It may be that either \(n_1=0\) or \(n_2=0\) as long as the dimensions are matched \(d_1+n_1=d_2+n_2.\)
One common rule for \(h_{1\to 2}\) is for some function \(\phi_{x'}\) that maps from \((x, w)\) to \(x'\), and some function \(\phi_{x}\) that maps from \((x', w')\) to \(x\) \[\begin{align*} \begin{array}{ll} h_{1\to2}: (x, w) \longmapsto (\phi_{x'}(x, w), w') & h_{2\to1}: (x', w') \longmapsto (\phi_x(x', w'), w) \\ h_{1\to2}: \mathbb{R}^{d_1+n_1} \longrightarrow \mathbb{R}^{d_2+n_2} & h_{2\to1}: \mathbb{R}^{d_2+n_2} \longrightarrow \mathbb{R}^{d_1+n_1} \end{array} \end{align*}\]

Reversible Jump MCMC algorithm: Let the parameter space be \[ \Theta:=\cup_{k}(\{k\} \times \mathcal{X}_k). \]

With starting value \(k^{(0)} \in \mathcal{K}\) some finite or countable set, \(X^{(0)} \in \mathcal{X}_{k^{(0)}}\). For \(s=0, 1, 2, \cdots\):

Choose \(k^{*}\) with probability \(j_{k^{(s)} \to k^{*}}(k^{(s)},X^{(s)})\). If \(k^{*}=k^{(s)}\), perform the usual MH update, that is \(X^{*} \sim q_{k^{(s)}}(\cdot|k^{(s)}, X^{(s)})\) (some usual proposal transition density in the space \(\mathcal{X}_{k^{(s)}}\)), and compute \(\alpha\) as \[\alpha(X^*|X^{(s)}) = \min \left(1, \frac{j_{k^{*}\to k^{(s)}}(k^{*}, X^*)q_{k^{(s)}}(X^{(s)}|k^{(*)},X^*)\pi(k^{*},X^*)}{j_{k^{(s)}\to k^{*}}(k^{(s)}, X^{(s)})q_{k^{(s)}}(X^*|k^{(s)},X^{(s)})\pi(k^{(s)},X^{(s)})} \right);\] then skip to the step 5.
If \(k^{*} \neq k^{(s)}\), we continue by generating some \(W^{(s)} \sim g_{k^{(s)} \to k^{*}}(\cdot|k^{(s)}, X^{(s)})\); then set \((X^{*}, W^{*})= h_{k^{(s)}\to k^{*}}(X^{(s)}, W^{(s)})\)
Compute \(\alpha\) as \[\begin{align*} \alpha\left(k^{*}, X^*, W^* \mid k^{(s)}, {X}^{(s)}, W^{(s)}\right)=\min \left\{1, \frac{j_{k^{*}\to k^{(s)}}(k^{*}, X^*) \pi\left(k^{*}, {X}^{*}\right) g_{k^{*} \to k^{(s)}}\left(W^{(*)}|k^{*}, X^{*}\right)}{j_{k^{(s)}\to k^{*}}(k^{(s)}, {X}^{(s)}) \pi\left(k^{(s)}, X^{(s)}\right) g_{k^{(s)} \to k^{*}}\left(W^{(s)}|k^{(s)}, X^{(s)}\right)}\left|\frac{\partial {h}_{k^{(s)}\to k^{*}}\left(X^{(s)}, W^{(s)}\right)}{\partial\left(X^{(s)}, W^{(s)}\right)}\right|\right\}, \end{align*}\]
Generate \(U \sim \mathrm{U}(0,1)\), if \(U \leq \alpha\), set \(k^{(s+1)} \leftarrow k^{*}\), \(X^{(s+1)} \leftarrow X^*\); otherwise set \(k^{(s+1)} \leftarrow k^{(s)}\), \(X^{(s+1)} \leftarrow X^{(s)}\).

Proof: provided in class.

Remarks:

Here we allow the state space transition probability \(j_{k^{(s)} \to k^{*}}(k^{(s)},X^{(s)})\) depends on \((k^{(s)},X^{(s)})\), for simpler design model, it could be pure deterministic rule.
It requires that \(\sum_{m'}j_{m \to m'}=1\). It also requires that if \(j_{m \to m'}=0\), then \(j_{m' \to m}=0\)–otherwise the acceptance probability is not well defined.
For any \(m, m'\) such that \(j_{m \to m'} \neq 0\), it requires that \(h_{m \to m'}\) to be a diffeomorphism.
It is a common practice that the type transition probability is set so \(j_{m \to m'}=0\) if the corresponding \(m\) and \(m'\) (or the models represented thereof) are not “adjacent”.
It is allowed that either \(W^*\) or \(W^{(s)}\) is zero dimensional (null), as long as the dimensions matching is maintained.
Here we allow the proposal density \(g_{k^{(s)} \to k^{*}}(\cdot)\) dependent on the current state, i.e., \(g_{k^{(s)} \to k^{*}}(\cdot|k^{(s)}, X^{(s)})\), for simpler design model, this needs not be so. Also, for some transition \(k \to k'\), we may not need to sample an auxiliary variable at all, that is, \(g_{k \to k'}\) can be set to \(1\).
This is possible to extend the above algorithm by allowing drawing a different move type \(m\in \mathcal{M}\) and all subsequent functions \(j, q, h\) and \(g\) then be made dependent on \(m\), with acceptance probability adjusted by a ratio of the type move probabilities.

Python Code Examples

Provided and discussed in lab.

Back Updated: 2024-01-21 Statistical Simulation, Wei Li