Back Updated: 2025-12-02 Statistical Simulation, Wei Li

Metropolis-Hasting algorithm with different move types

We consider MH algorithms to sample from \(f\) but with different move types; which gives a more general framework for MH algorithms.

We first randomly choose a move type \(m\in \mathcal{M}\) and then generate \(X^*\) with a move-dependent transition density \(q_m(\cdot|X^{(s)})\), where \(q_m(\cdot|\cdot): \mathbb{R}^d \times \mathbb{R}^d \to \mathbb{R}\).

MH algorithm with different move types:

Let \(s=0\).

  1. Generate \(m^{(s+1)}\in \mathcal{M}\) with probability \(P(m^{(s+1)}=m)=\gamma_m(X^{(s)})\);
  2. Generate \(X^* \sim q_{m^{(s+1)}}(\cdot|X^{(s)})\);
  3. Let acceptance probablity be \[ \alpha_{m^{(s+1)}}(X^{*}|X^{(s)})=\min \left\{ \frac{f(X^{*})\gamma_{m^{(s+1)}}(X^{*})q_{m^{(s+1)}}(X^{(s)}|X^{*})}{f(X^{(s)})\gamma_{m^{(s+1)}}(X^{(s)})q_{m^{(s+1)}}(X^{*}|X^{(s)})}, 1 \right\}. \]
  4. Generate \(U\sim \mbox{unif}(0,1)\)
    1. if \(U\leq \alpha_{m^{(s+1)}}(X^{*}|X^{(s)})\), set \(X^{(s+1)} = X^*\);
    2. if \(U> \alpha_{m^{(s+1)}}(X^{*}|X^{(s)})\), set \(X^{(s+1)} = X^{(s)}\).
  5. \(s\leftarrow s+1.\)

Remarks

  1. \(\gamma_m(x)\) can depend on time as well. In the simplest case, for example, \(\mathcal{M}=\{1, 2,\ldots, M\}\), we can choose \(m\) to be \((s \text{ mod } M)+1\), this will cylce through all move types.
  2. Every move type can be shown to satisfy detailed balance condition, but the composition needs not.
  3. Note that \(q_m(\cdot|\cdot): \mathbb{R}^d \times \mathbb{R}^d \to \mathbb{R}\), this transition density may not be well-defined if we just want to update a single/part of the whole vector of \(X \in \mathbb{R}^d\), so \(X^*\) falls on a lower dimensional space of \(X \in \mathbb{R}^d\).

A modification:

To address the above issue (3), we have a more general update rule for the acceptance probability. We need to assume two things for this.

Assume for each move type \(m\in \mathcal{M}\), (1) there is a set \(C_m \subset \mathbb{R}^d \times \mathbb{R}^d\) such that \[ (x, x') \in C_m \Longleftrightarrow (x', x) \in C_m , \quad \forall x, x' \] and; (2) there is a density \(\psi_m: C_m \to [0, \infty]\) such that the proposed \(X^*\) transitioning from \(X^{(s)} \sim f\) satisfies \[ P(X^*\in B, X^{(s)}\in A) =\int_{C_m} 1_A(x)1_B(x')\psi_m(x, x')dx'dx \] for all \(A, B\). Then the (modified) acceptance probability is \[ \alpha_m(X^*|X^{(s)})=\min \left\{ \frac{\gamma_m(X^{^*}) \psi_m(X^{*}, X^{(s)})}{\gamma_m(X^{(s)}) \psi_m(X^{(s)}, X^{*})}, 1 \right\}. \]

Remark:

  1. If the transition density \(q_m\) from \(X^{(s)}\) to \(X^*\) exists (or well-defined density), then \(\psi_m(x, x')=f(x)q_m(x'|x)\), giving the previous algorithm. More generally, however, in this modified algorithm, is that we allow \(C_m\) to be a lower dimensional space of \(\mathbb{R}^d \times \mathbb{R}^d\).
  2. This modified algorithm generalizes the blockwise MH algorithm and the Gibbs sampling.

Proof: provided in class.

Reversible Jump Markov Chain Monte Carlo

The RJMCMC algorithm is a generalization of the MH algorithm that allows moving across different dimensions of the parameters (models). The see how this algorithm work, we consider a general form of HM algorithm.

Recall the detailed balance condition \[\begin{align} f(x) p\left(x, x^{\prime}\right)=f\left(x^{\prime}\right) p\left(x^{\prime}, x\right) \tag{*} \end{align}\] where \(p\) is the transition density moving from \(x\) to \(x'\). An equivalent form is \[\begin{align*} \int_A \int_{A^{\prime}} f(x) p\left(x, x^{\prime}\right) d x^{\prime} d x=\int_A \int_{A^{\prime}} f\left(x^{\prime}\right) p\left(x^{\prime}, x\right) d x^{\prime} d x, \quad \forall A, A' \end{align*}\] or \[\begin{align} \int_A f(x) P\left(x, A^{\prime}\right) d x=\int_{A^{\prime}} f\left(x^{\prime}\right) P\left(x^{\prime}, A\right) d x^{\prime}, \quad \forall A, A^{\prime} \tag{**} \end{align}\] where \(P(x, A'):=\int_{A'} p(x, x')dx'\). Recall also the classical HM (effective) transition probability using some tentative transition density \(q(x, x')\), takes the form of \[\begin{align*} P\left(x, A^{\prime}\right)=\int_{A^{\prime}} q\left(x, x^{\prime}\right) \alpha\left(x, x^{\prime}\right) d x^{\prime}+ r(x)\delta_A(x), \end{align*}\] where \(r(x):=1-\int \alpha(x, x') q(x, x')dx'\), and \(\alpha(x, x')\) is some acceptance probability.

We then obtain a useful equation that characterizes the feature of the detailed balance condition using \[\begin{align} \int_A f(x) \int_{A^{\prime}} q\left(x, x^{\prime}\right) \alpha\left(x, x^{\prime}\right) d x^{\prime} d x=\int_A f\left(x^{\prime}\right) \int_{A^{\prime}} q\left(x^{\prime}, x\right) \alpha\left(x^{\prime}, x\right) d x^{\prime} d x \tag{***} \end{align}\]

This characterization remains true if we consider some transition between two different spaces \(\mathcal{X}_1\) and \(\mathcal{X}_2\), where \(\mathcal{X}_1 \subset \mathbb{R}^{d_1}\) and \(\mathcal{X}_2 \subset \mathbb{R}^{d_2}\), \(d_1 \neq d_2\).

But the foremost issue is that how to make transition practically across spaces. Since they are in different spaces, obviously, \(q(\cdot, \cdot)\) needs to be state-spaces dependent, that is, we need to have \(q_{1 \to 2}(x, x')\) and \(q_{2 \to 1}(x', x)\). And if such transitions become available, then we can attempt to deduce the acceptance probabilities \(\alpha_{1 \to 2}(x, x')\) and \(\alpha_{2 \to 1}(x', x)\) that balance both sides of the equation (***): \[ \int_A f(x) \int_{A^{\prime}} q_{1 \to 2}\left(x, x^{\prime}\right) \alpha_{1 \to 2}\left(x, x^{\prime}\right) d x^{\prime} d x=\int_A f\left(x^{\prime}\right) \int_{A^{\prime}} q_{2 \to 1}\left(x^{\prime}, x\right) \alpha_{2 \to 1 }\left(x^{\prime}, x\right) d x^{\prime} d x \]

valid reversible mapping:

Suppose we like to move between two spaces \(\mathcal{X}_1 \subset \mathbb{R}^{d_1}\) and \(\mathcal{X}_2 \subset \mathbb{R}^{d_2}\), \(d_1 \neq d_2\), where the whole space is \(\mathcal{X}=\mathcal{X}_1 \cup \mathcal{X}_2.\) Let \(K=\{1, 2\}\) denote the type space (model space).

Suppose we have some probability according to which to move from one space to another space, for example, \(j_{1\to 2} (x)\) is the probability of moving from \(\mathcal{X}_1\) to \(\mathcal{X}_2\) when the current value is \(x\in \mathcal{X}_1.\) More generally, let \(j_{k \to k'}\) denote the probability of moving from space \(k\) to space \(k'\).

For a MCMC algorithm, we’ll have to determine the rule for (tentative) transition density and the acceptance probability. If we remain in the same space, the classical proposal for transition and the acceptance probability would work.

The key is when the move has been determined that it moves between spaces \(k\) and \(k'\) where \(\mathcal{X}_k \neq \mathcal{X}_{k'}.\) The critical step is to establish a valid reversible mapping between \(\mathcal{X}_k\) and \(\mathcal{X}_{k^{\prime}}\).

So we’ll focus for example, on the transition and acceptance rule moving from \(1\) to \(2\).

Remark:

  1. It may be that either \(n_1=0\) or \(n_2=0\) as long as the dimensions are matched \(d_1+n_1=d_2+n_2.\)
  2. One common rule for \(h_{1\to 2}\) is for some function \(\phi_{x'}\) that maps from \((x, w)\) to \(x'\), and some function \(\phi_{x}\) that maps from \((x', w')\) to \(x\) \[\begin{align*} \begin{array}{ll} h_{1\to2}: (x, w) \longmapsto (\phi_{x'}(x, w), w') & h_{2\to1}: (x', w') \longmapsto (\phi_x(x', w'), w) \\ h_{1\to2}: \mathbb{R}^{d_1+n_1} \longrightarrow \mathbb{R}^{d_2+n_2} & h_{2\to1}: \mathbb{R}^{d_2+n_2} \longrightarrow \mathbb{R}^{d_1+n_1} \end{array} \end{align*}\]

Example 1: Adding or Removing a Parameter (Linear Regression) Suppose we have two nested linear regression models:

Thus, \(d_1=2\) and \(d_2=3\).

Example 2: Birth-Death Move in Mixture Models Suppose \(\mathcal{X}_K\) represents the parameter space for a Gaussian mixture model with \(K\) components. Model parameters can be written as

\[\begin{align*} \boldsymbol{\theta}=\left(\theta_1, \ldots, \theta_K\right) \end{align*}\]

where each \(\theta_k=\left(\mu_k, \sigma_k, \pi_k\right)\) denotes the mean, variance, and mixing proportion of component \(k\). We consider a birth-death pair of moves:

Reversible Jump MCMC algorithm:

Let the parameter space be \[ \Theta:=\cup_{k}(\{k\} \times \mathcal{X}_k). \]

We define the set of possible (admissible) move types as \(\mathcal{M}\) the collection of \((k, k^{\prime}) \in K \times K\) for which a valid reversible mapping exists between \(\mathcal{X}_k\) and \(\mathcal{X}_{k^{\prime}}\).

Let \(j_{k \rightarrow k^{\prime}}(x)\) denote the probability of proposing a move from space \(k\) to space \(k^{\prime}\), when the current state is \(x \in \mathcal{X}_k\). (see remark for requirement for these these transition probabilities.)

Specifically, each valid move type \(\left(k, k^{\prime}\right) \in \mathcal{M}\) has:

  • an auxiliary variable generation rule \(g_{k \rightarrow k^{\prime}}(w \mid x)\) (maybe optional),
  • a deterministic bijective transformation \[\begin{align*} h_{k \rightarrow k^{\prime}}:(x, w) \mapsto\left(x^{\prime}, w^{\prime}\right) \end{align*}\] satisfying dimension matching and invertibility.

With starting value \(k^{(0)} \in \mathcal{K}\) some finite or countable set, \(X^{(0)} \in \mathcal{X}_{k^{(0)}}\). For \(s=0, 1, 2, \cdots\):

  1. Choose \(k^{*}\) with probability \(j_{k^{(s)} \to k^{*}}(k^{(s)},X^{(s)})\). If \(k^{*}=k^{(s)}\), perform the usual MH update, that is \(X^{*} \sim q_{k^{(s)}}(\cdot|k^{(s)}, X^{(s)})\) (some usual proposal transition density in the space \(\mathcal{X}_{k^{(s)}}\)), and compute \(\alpha\) as \[\alpha_{k^{(s)} \to k^{*}}(X^*|X^{(s)}) = \min \left(1, \frac{j_{k^{*}\to k^{(s)}}(k^{*}, X^*)q_{k^{(s)}}(X^{(s)}|k^{(*)},X^*)f(k^{*},X^*)}{j_{k^{(s)}\to k^{*}}(k^{(s)}, X^{(s)})q_{k^{(s)}}(X^*|k^{(s)},X^{(s)})f(k^{(s)},X^{(s)})} \right);\] then skip to the step 4.
  2. If \(k^{*} \neq k^{(s)}\), we continue by generating some \(W^{(s)} \sim g_{k^{(s)} \to k^{*}}(\cdot|k^{(s)}, X^{(s)})\); then apply the deterministic rule \((X^{*}, W^{*})= h_{k^{(s)}\to k^{*}}(X^{(s)}, W^{(s)})\)
  3. Compute \(\alpha\) as \[\begin{align*} &\alpha_{k^{(s)} \to k^{*}}\left(k^{*}, X^*, W^* \mid k^{(s)}, {X}^{(s)}, W^{(s)}\right)\\=\min & \left\{1, \frac{j_{k^{*}\to k^{(s)}}(k^{*}, X^*) f\left(k^{*}, {X}^{*}\right) g_{k^{*} \to k^{(s)}}\left(W^{(*)}|k^{*}, X^{*}\right)}{j_{k^{(s)}\to k^{*}}(k^{(s)}, {X}^{(s)}) f\left(k^{(s)}, X^{(s)}\right) g_{k^{(s)} \to k^{*}}\left(W^{(s)}|k^{(s)}, X^{(s)}\right)}\left|\frac{\partial {h}_{k^{(s)}\to k^{*}}\left(X^{(s)}, W^{(s)}\right)}{\partial\left(X^{(s)}, W^{(s)}\right)}\right|\right\}, \end{align*}\]
  4. Generate \(U \sim \mathrm{U}(0,1)\), if \(U \leq \alpha\), set \(k^{(s+1)} \leftarrow k^{*}\), \(X^{(s+1)} \leftarrow X^*\); otherwise set \(k^{(s+1)} \leftarrow k^{(s)}\), \(X^{(s+1)} \leftarrow X^{(s)}\).

Proof: provided in class.

Remarks:

  1. Here we allow the state space transition probability \(j_{k^{(s)} \to k^{*}}(k^{(s)},X^{(s)})\) depends on \((k^{(s)},X^{(s)})\), for simpler design model, it could be pure deterministic rule.
  2. It requires that \(\sum_{k'}j_{k \to k'}=1\). It also requires that if \(j_{k \to k'}=0\), then \(j_{k' \to k}=0\)–otherwise the acceptance probability is not well defined.
  3. For any \(k, k'\) such that \(j_{k \to k'} \neq 0\), it requires that \(h_{k \to k'}\) to be a diffeomorphism.
  4. It is a common practice that the type transition probability is set so \(j_{k \to k'}=0\) if the corresponding \(k\) and \(k'\) (or the models represented thereof) are not “adjacent”.
  5. It is allowed that either \(W^*\) or \(W^{(s)}\) is zero dimensional (null), as long as the dimensions matching is maintained.
  6. Here we allow the proposal density \(g_{k^{(s)} \to k^{*}}(\cdot)\) dependent on the current state, i.e., \(g_{k^{(s)} \to k^{*}}(\cdot|k^{(s)}, X^{(s)})\), for simpler design model, this needs not be so. Also, for some transition \(k \to k'\), we may not need to sample an auxiliary variable at all, that is, \(g_{k \to k'}\) can be set to \(1\).
  7. It is often allowed for the move type for which \(k^{(s)}= k^*\), to draw some sub-type \(\underset{\sim}{m} \in \underset{\sim}{\mathcal{M}}_{k^{(s)}= k^*}(X^{(s)})\) according to some rule \(p(\underset{\sim}{m})=\gamma_{\underset{\sim}{m}, k^{(s)}= k^*}(X^{(s)})\), before sampling \(X^* \sim q_{k^{(s)}, \underset{\sim}{m}}\left(\cdot \mid k^{(s)}, X^{(s)}\right)\), where now \(q\) may depend on the sub-type. To account for this additional choice, the second part inside the \(\min(1, \cdot)\) of the acceptance probability \(\alpha\) needs to multiply an additional factor \(\gamma_{\underset{\sim}{m}, k^{(s)}= k^*}(X^*)/\gamma_{\underset{\sim}{m}, k^{(s)}= k^*}(X^{(s)})\).

Example 2 (continued): Birth-Death Move in Mixture Models

Let the current model be a finite mixture with \(K\) components. Let the parameter vector be \[\begin{align*} \boldsymbol{\theta}=\left(\theta_1, \ldots, \theta_K\right), \end{align*}\]

where each component parameter \(\theta_k \in \mathcal{\Theta} \subset \mathbb{R}^m\) - recall \(\theta_k=\left(\mu_k, \sigma_k, \pi_k\right)\).

The RJMCMC state is \((K, \boldsymbol{\theta})\), with target density (posterior) \[\begin{align*} \pi(K, \boldsymbol{\theta}) \propto p(y \mid K, \boldsymbol{\theta}) p(\boldsymbol{\theta} \mid K) p(K) . \end{align*}\] We allow moves between models \(K\) and \(K+1\) by birth (add a component) and death (remove a component).

Fix an index \(j \in\{1, \ldots, K+1\}\). We consider the bijection pair the mappings \(h_{K \rightarrow K+1}^{(j)}\) and \(h_{K+1 \rightarrow K}^{(j)}\), where we proposed new component \(w \in \mathcal{\Theta}\), and use \(h_{K \rightarrow K+1}^{(j)}\) to insert \(w\) as the \(j\)-th component. For the reverse move, we use \(h_{K+1 \rightarrow K}^{(j)}\) to remove the \(j\)-th component. (One can show that both functions have Jacobian whose determinant is equal to \(1\) in absolute value).

RJMCMC algorithm

Birth proposal: \((K, \boldsymbol{\theta}) \rightarrow\left(K+1, \boldsymbol{\theta}^{\prime}\right)\).

At iteration \(s\), current state \((K, \boldsymbol{\theta}^{(s)})\) :

(1). Decide to attempt birth with probability \(b_K:=j_{K \rightarrow K+1}(\boldsymbol{\theta}^{(s)})\), we attempt a birth move (otherwise, for simplicity, we might attempt a death move or some withinmodel move).

(2). Choose insertion index \({j}\). Sample \(j \sim J_{K \rightarrow K+1}(\cdot \mid \boldsymbol{\theta}^{(s)})\) typically \[\begin{align*} J_{K \rightarrow K+1}(j \mid \boldsymbol{\theta}^{(s)})=\frac{1}{K+1}, \quad j=1, \ldots, K+1 . \end{align*}\] (3). Propose new component parameters. Sample \(W^{(s)} \sim g_{K \rightarrow K+1}(\cdot \mid \boldsymbol{\theta}^{(s)})\), e.g. from a prior or a data-informed proposal.

(4). Apply deterministic mapping: set \(\boldsymbol{\theta}^*=h_{K \rightarrow K+1}^{(j)}(\boldsymbol{\theta}^{(s)}, W^{(s)}).\)

The proposed state is \(\left(K+1, \boldsymbol{\theta}^*\right)\). The forward proposal density (conditioned on attempting a birth) is \[\begin{align*} q_{\text {birth }}\left(\left(K+1, \boldsymbol{\theta}^*\right) \mid (K, \boldsymbol{\theta}^{(s)})\right)=b_K J_{K \rightarrow K+1}(j \mid \boldsymbol{\theta}^{(s)}) g_{K \rightarrow K+1}(W^{(s)} \mid \boldsymbol{\theta}^{(s)}), \end{align*}\]

Death proposal: \(\left(K+1, \boldsymbol{\theta}^{\prime}\right) \rightarrow(K, \boldsymbol{\theta})\)

Now suppose the current state is \((K+1, \boldsymbol{\theta}^{(s)})\).

(1). Decide to attempt death with probability \(d_{K+1}:=j_{K+1 \rightarrow K}(\boldsymbol{\theta}^{(s)})\), we attempt a death move.

(2). Choose deletion index \(j\). Sample \(j \sim J_{K+1 \rightarrow K}\left(\cdot \mid \boldsymbol{\theta}^{(s)}\right)\), typically \[\begin{align*} J_{K+1 \rightarrow K}(j \mid \boldsymbol{\theta}^{(s)})=\frac{1}{K+1}, \quad j=1, \ldots, K+1 . \end{align*}\]

(3). Apply deterministic mapping: set \((\boldsymbol{\theta}^*, W^*)=h_{K+1 \rightarrow K}^{(j)}(\boldsymbol{\theta}^{(s)})\), where \(\boldsymbol{\theta}^*=\boldsymbol{\theta}_{-j}^{(s)}, \quad W^*=\boldsymbol{\theta}_j^{(s)}\).

The reverse proposal density is \[\begin{align*} q_{\text {death }}\left(\left(K, \boldsymbol{\theta}^*\right) \mid (K+1, \boldsymbol{\theta}^{(s)})\right)=d_{K+1} J_{K+1 \rightarrow K}(j \mid \boldsymbol{\theta}^{(s)}) \end{align*}\]

If we use equal probabilities for selecting index \(j\), and assume \(K\) is large enough (i.e., we do not need to worry about the boundary case), we can obtain the the acceptance probabilities: \[\begin{align*} \begin{gathered} \alpha_{\text {birth }}=\min \left\{1, \frac{\pi\left(K+1, \boldsymbol{\theta}^*\right) d_{K+1}}{\pi\left(K, \boldsymbol{\theta}^{(s)}\right) b_K g_{K \rightarrow K+1}\left(W^{(s)} \mid \boldsymbol{\theta}^{(s)}\right)}\right\} \\ \alpha_{\text {death }}=\min \left\{1, \frac{\pi\left(K, \boldsymbol{\theta}^*\right) b_K g_{K \rightarrow K+1}\left(W^* \mid \boldsymbol{\theta}^*\right)}{\pi\left(K+1, \boldsymbol{\theta}^{(s)}\right) d_{K+1}}\right\}. \end{gathered} \end{align*}\]

Python Code Examples

Provided and discussed in lab.

Back Updated: 2025-12-02 Statistical Simulation, Wei Li