Back
Updated: 2024-01-21
Statistical Simulation, Wei Li
Metropolis-Hasting algorithm with different move types
We consider MH algorithms with different move types; which gives a
more general framework for MH algorithms.
- \(X \in \mathbb{R}^{d}.\)
- Let \(\mathcal{M}\) denote a finite
or countable set of move types, e.g., \(\mathcal{M}=\{1, 2, \ldots, M\}\).
- \(\gamma_m(x)\) denote probability
of choosing move \(m\) when the current
value \(X^{(s)}=x\), \(\sum_{m} \gamma_m(x)=1\).
We first randomly choose a move type \(m\in
\mathcal{M}\) and then generate \(X^*\) with a move-dependent transition
density \(q_m(\cdot|X^{(s)})\), where
\(q_m(\cdot|\cdot): \mathbb{R}^d \times
\mathbb{R}^d \to \mathbb{R}\).
MH algorithm with different move types:
Let \(s=0\).
- Generate \(m^{(s+1)}\in
\mathcal{M}\) with probability \(P(m^{(s+1)}=m)=\gamma_m(X^{(s)})\);
- Generate \(X^* \sim
q_{m^{(s+1)}}(\cdot|X^{(s)})\);
- Let acceptance probablity be \[
\alpha_{m^{(s+1)}}(X^{*}|X^{(s)})=\min \left\{
\frac{f(X^{*})\gamma_{m^{(s+1)}}(X^{*})q_{m^{(s+1)}}(X^{(s)}|X^{*})}{f(X^{(s)})\gamma_{m^{(s+1)}}(X^{(s)})q_{m^{(s+1)}}(X^{*}|X^{(s)})},
1 \right\}.
\]
- Generate \(U\sim \mbox{unif}(0,1)\)
- if \(U\leq
\alpha_{m^{(s+1)}}(X^{*}|X^{(s)})\), set \(X^{(s+1)} = X^*\);
- if \(U>
\alpha_{m^{(s+1)}}(X^{*}|X^{(s)})\), set \(X^{(s+1)} = X^{(s)}\).
- \(s\leftarrow s+1.\)
Remarks
- \(\gamma_m(x)\) can depend on time
as well. In the simplest case, for example, \(\mathcal{M}=\{1, 2,\ldots, M\}\), we can
choose \(m\) to be \((s \text{ mod } M)+1\), this will cylce
through all move types.
- Every move type can be shown to satisfy detailed balance condition,
but the composition needs not.
- Note that \(q_m(\cdot|\cdot): \mathbb{R}^d
\times \mathbb{R}^d \to \mathbb{R}\), this transition density may
not be well-defined if we just want to update a single/part of the whole
vector of \(X \in \mathbb{R}^d\), so
\(X^*\) falls on a lower dimensional
space of \(X \in \mathbb{R}^d\).
A modification:
To address the above issue (3), we have a more general update rule
for the acceptance probability. We need to assume two things for
this.
Assume for each move \(m\in
\mathcal{M}\), (1) there is a set \(C_m
\subset \mathbb{R}^d \times \mathbb{R}^d\) such that \[
(x, x') \in C_m \Longleftrightarrow (x', x) \in C_m , \quad
\forall x, x'
\] and; (2) there is a density \(\psi_m: C_m \to [0, \infty]\) such that the
proposed \(X^*\) transitioning from
\(X^{(s)} \sim f\) satisfies \[
P(X^*\in B, X^{(s)}\in A) =\int_{C_m} 1_A(x)1_B(x')\psi_m(x,
x')dx'dx
\] for all \(A, B\). Then the
(modified) acceptance probability is \[
\alpha_m(X^*|X^{(s)})=\min \left\{ \frac{\gamma_m(X^{^*}) \psi_m(X^{*},
X^{(s)})}{\gamma_m(X^{(s)}) \psi_m(X^{(s)}, X^{*})}, 1 \right\}.
\]
Remark:
- If the transition density \(q_m\)
from \(X^{(s)}\) to \(X^*\) exists (or well-defined density),
then \(\psi_m(x,
x')=f(x)q_m(x'|x)\), giving the previous algorithm. More
generally, however, in this modified algorithm, is that we allow \(C_m\) to be a lower dimensional space of
\(\mathbb{R}^d \times
\mathbb{R}^d\).
- This modified algorithm generalizes the blockwise MH algorithm and
the Gibbs sampling.
Proof: provided in class.
Reversible Jump Markov Chain Monte Carlo
The RJMCMC algorithm is a generalization of the MH algorithm that
allows moving across different dimensions of the parameters (models).
The see how this algorithm work, we consider a general form of HM
algorithm.
Recall the detailed balance condition \[\begin{align}
\pi(x) p\left(x, x^{\prime}\right)=\pi\left(x^{\prime}\right)
p\left(x^{\prime}, x\right) \tag{*}
\end{align}\] where \(p\) is the
transition density moving from \(x\) to
\(x'\). An equivalent form is \[\begin{align*}
\int_A \int_{A^{\prime}} \pi(x) p\left(x, x^{\prime}\right) d x^{\prime}
d x=\int_A \int_{A^{\prime}} \pi\left(x^{\prime}\right)
p\left(x^{\prime}, x\right) d x^{\prime} d x, \quad \forall A, A'
\end{align*}\] or \[\begin{align}
\int_A \pi(x) P\left(x, A^{\prime}\right) d x=\int_{A^{\prime}}
\pi\left(x^{\prime}\right) P\left(x^{\prime}, A\right) d x^{\prime},
\quad \forall A, A^{\prime} \tag{**}
\end{align}\] where \(P(x,
A'):=\int_{A'} p(x, x')dx'\). Recall also the
classical HM (effective) transition probability using some tentative
transition density \(q(x, x')\),
takes the form of \[\begin{align*}
P\left(x, A^{\prime}\right)=\int_{A^{\prime}} q\left(x,
x^{\prime}\right) \alpha\left(x, x^{\prime}\right) d x^{\prime}+
r(x)\delta_A(x),
\end{align*}\] where \(r(x):=1-\int
\alpha(x, x') q(x, x')dx'\), and \(\alpha(x, x')\) is some acceptance
probability.
We then obtain a useful equation that characterizes the feature of
the detailed balance condition using \[\begin{align}
\int_A \pi(x) \int_{A^{\prime}} q\left(x, x^{\prime}\right)
\alpha\left(x, x^{\prime}\right) d x^{\prime} d x=\int_A
\pi\left(x^{\prime}\right) \int_{A^{\prime}} q\left(x^{\prime},
x\right) \alpha\left(x^{\prime}, x\right) d x^{\prime} d x \tag{***}
\end{align}\]
Suppose we like to move between two spaces \(\mathcal{X}_1 \subset \mathbb{R}^{d_1}\)
and \(\mathcal{X}_2 \subset
\mathbb{R}^{d_2}\), \(d_1 \neq
d_2\), where the whole space is \(\mathcal{X}=\mathcal{X}_1 \cup
\mathcal{X}_2.\) Let \(M=\{1,
2\}\) denote the model type or space type.
We first need to have some probability according to which to move
from one space to another space, for example, \(j_{1\to 2} (x)\) is the probability of
moving from \(\mathcal{X}_1\) to \(\mathcal{X}_2\) when the current value is
\(x\in \mathcal{X}_1.\) More generally,
let \(j_{k \to k'}\) denote the
probability of moving from space \(k\)
to space \(k'\).
After we determine the space type, we’ll have to determine the rule
for (tentative) transition density and the acceptance probability. If we
remain in the same space, the classical proposal for transition and the
acceptance probability would work. The key is when the move has been
determined that it moves between spaces \(k\) and \(k'\) where \(\mathcal{X}_k \neq
\mathcal{X}_{k'}.\)
So we’ll focus for example, on the transition and acceptance rule
moving from \(1\) to \(2\).
- generate some \(w\in
\mathbb{R}^{n_1}\) according to some density function \(g_{1\to2}(\cdot)\)
- use some deterministic bijective mapping: \[
h_{1\to2}: (x, w) \mapsto (x' ,w') \quad \mathbb{R}^{d_1+n_1}
\to
\mathbb{R}^{d_2+n_2}
\] which is dimensions matching (\(d_1+ n_1= d_2+ n_2\)) to obtain \(x' \in \mathcal{X}_2\).
- the map \(h_{1\to2}\) is assumed to
be a diffeomophism (continuously differentiable and so is its
inverse).
- when moving backward from \(x'\in
\mathcal{X}_2\) to \(x \in
\mathcal{X}_1\), we may sample some \(w'\) according to some density \(g_{2\to1}(\cdot)\), and move under the
deterministic bijective mapping \(h_{2\to1}:=h_{1\to2}^{-1}\) where \[
h_{2 \to 1}: (x', w') \mapsto (x, w) \quad \mathbb{R}^{d_2+n_2}
\to
\mathbb{R}^{d_1+n_1}
\] That is \[\begin{align*}
\begin{array}{ll}
h_{1\to2}: (x, w) \longmapsto (x', w') & h_{2\to1}: (x',
w') \longmapsto (x, w) \\
h_{1\to2}: \mathbb{R}^{d_1+n_1} \longrightarrow \mathbb{R}^{d_2+n_2}
& h_{2\to1}: \mathbb{R}^{d_2+n_2} \longrightarrow
\mathbb{R}^{d_1+n_1}
\end{array}
\end{align*}\]
- The acceptance probability is given by \[\begin{align*}
\alpha_{1\to 2}\left((x', w') \mid (x, w) \right)=\min \left\{1,
\frac{\pi\left(x'\right) g_{2 \rightarrow
1}\left(w'\right)}{\pi\left(x\right) g_{1 \rightarrow 2}\left(w
\right)}\left|\frac{\partial {h}_{1 \rightarrow 2}\left(x, w
\right)}{\partial\left(x, w\right)}\right|\right\},
\end{align*}\] where the \(|\partial
{h}_{1 \rightarrow 2}(x, w)/\partial(x, w)|\) is the absolute
value of the determinant of the Jacobian matrix.
Remark:
- It may be that either \(n_1=0\) or
\(n_2=0\) as long as the dimensions are
matched \(d_1+n_1=d_2+n_2.\)
- One common rule for \(h_{1\to 2}\)
is for some function \(\phi_{x'}\)
that maps from \((x, w)\) to \(x'\), and some function \(\phi_{x}\) that maps from \((x', w')\) to \(x\) \[\begin{align*}
\begin{array}{ll}
h_{1\to2}: (x, w) \longmapsto (\phi_{x'}(x, w), w') &
h_{2\to1}: (x', w') \longmapsto (\phi_x(x', w'), w) \\
h_{1\to2}: \mathbb{R}^{d_1+n_1} \longrightarrow \mathbb{R}^{d_2+n_2}
& h_{2\to1}: \mathbb{R}^{d_2+n_2} \longrightarrow
\mathbb{R}^{d_1+n_1}
\end{array}
\end{align*}\]
Reversible Jump MCMC algorithm: Let the parameter
space be \[
\Theta:=\cup_{k}(\{k\} \times \mathcal{X}_k).
\]
With starting value \(k^{(0)} \in
\mathcal{K}\) some finite or countable set, \(X^{(0)} \in \mathcal{X}_{k^{(0)}}\). For
\(s=0, 1, 2, \cdots\):
- Choose \(k^{*}\) with probability
\(j_{k^{(s)} \to
k^{*}}(k^{(s)},X^{(s)})\). If \(k^{*}=k^{(s)}\), perform the usual MH
update, that is \(X^{*} \sim
q_{k^{(s)}}(\cdot|k^{(s)}, X^{(s)})\) (some usual proposal
transition density in the space \(\mathcal{X}_{k^{(s)}}\)), and compute \(\alpha\) as \[\alpha(X^*|X^{(s)}) = \min \left(1,
\frac{j_{k^{*}\to k^{(s)}}(k^{*},
X^*)q_{k^{(s)}}(X^{(s)}|k^{(*)},X^*)\pi(k^{*},X^*)}{j_{k^{(s)}\to
k^{*}}(k^{(s)},
X^{(s)})q_{k^{(s)}}(X^*|k^{(s)},X^{(s)})\pi(k^{(s)},X^{(s)})}
\right);\] then skip to the step 5.
- If \(k^{*} \neq k^{(s)}\), we
continue by generating some \(W^{(s)} \sim
g_{k^{(s)} \to k^{*}}(\cdot|k^{(s)}, X^{(s)})\); then set \((X^{*}, W^{*})= h_{k^{(s)}\to k^{*}}(X^{(s)},
W^{(s)})\)
- Compute \(\alpha\) as \[\begin{align*}
\alpha\left(k^{*}, X^*, W^* \mid k^{(s)}, {X}^{(s)}, W^{(s)}\right)=\min
\left\{1, \frac{j_{k^{*}\to k^{(s)}}(k^{*}, X^*) \pi\left(k^{*},
{X}^{*}\right) g_{k^{*} \to k^{(s)}}\left(W^{(*)}|k^{*},
X^{*}\right)}{j_{k^{(s)}\to k^{*}}(k^{(s)},
{X}^{(s)}) \pi\left(k^{(s)}, X^{(s)}\right) g_{k^{(s)} \to
k^{*}}\left(W^{(s)}|k^{(s)}, X^{(s)}\right)}\left|\frac{\partial
{h}_{k^{(s)}\to k^{*}}\left(X^{(s)},
W^{(s)}\right)}{\partial\left(X^{(s)}, W^{(s)}\right)}\right|\right\},
\end{align*}\]
- Generate \(U \sim
\mathrm{U}(0,1)\), if \(U \leq
\alpha\), set \(k^{(s+1)} \leftarrow
k^{*}\), \(X^{(s+1)} \leftarrow
X^*\); otherwise set \(k^{(s+1)}
\leftarrow k^{(s)}\), \(X^{(s+1)}
\leftarrow X^{(s)}\).
Proof: provided in class.
Remarks:
- Here we allow the state space transition probability \(j_{k^{(s)} \to k^{*}}(k^{(s)},X^{(s)})\)
depends on \((k^{(s)},X^{(s)})\), for
simpler design model, it could be pure deterministic rule.
- It requires that \(\sum_{m'}j_{m \to
m'}=1\). It also requires that if \(j_{m \to m'}=0\), then \(j_{m' \to m}=0\)–otherwise the
acceptance probability is not well defined.
- For any \(m, m'\) such that
\(j_{m \to m'} \neq 0\), it
requires that \(h_{m \to m'}\) to
be a diffeomorphism.
- It is a common practice that the type transition probability is set
so \(j_{m \to m'}=0\) if the
corresponding \(m\) and \(m'\) (or the models represented
thereof) are not “adjacent”.
- It is allowed that either \(W^*\)
or \(W^{(s)}\) is zero dimensional
(null), as long as the dimensions matching is maintained.
- Here we allow the proposal density \(g_{k^{(s)} \to k^{*}}(\cdot)\) dependent on
the current state, i.e., \(g_{k^{(s)} \to
k^{*}}(\cdot|k^{(s)}, X^{(s)})\), for simpler design model, this
needs not be so. Also, for some transition \(k
\to k'\), we may not need to sample an auxiliary variable at
all, that is, \(g_{k \to k'}\) can
be set to \(1\).
- This is possible to extend the above algorithm by allowing drawing a
different move type \(m\in
\mathcal{M}\) and all subsequent functions \(j, q, h\) and \(g\) then be made dependent on \(m\), with acceptance probability adjusted
by a ratio of the type move probabilities.
Python Code Examples
Provided and discussed in lab.
Back
Updated: 2024-01-21
Statistical Simulation, Wei Li