Back Updated: 2025-12-02 Statistical Simulation, Wei Li
We consider MH algorithms to sample from \(f\) but with different move types; which gives a more general framework for MH algorithms.
We first randomly choose a move type \(m\in \mathcal{M}\) and then generate \(X^*\) with a move-dependent transition density \(q_m(\cdot|X^{(s)})\), where \(q_m(\cdot|\cdot): \mathbb{R}^d \times \mathbb{R}^d \to \mathbb{R}\).
MH algorithm with different move types:
Let \(s=0\).
Remarks
A modification:
To address the above issue (3), we have a more general update rule for the acceptance probability. We need to assume two things for this.
Assume for each move type \(m\in \mathcal{M}\), (1) there is a set \(C_m \subset \mathbb{R}^d \times \mathbb{R}^d\) such that \[ (x, x') \in C_m \Longleftrightarrow (x', x) \in C_m , \quad \forall x, x' \] and; (2) there is a density \(\psi_m: C_m \to [0, \infty]\) such that the proposed \(X^*\) transitioning from \(X^{(s)} \sim f\) satisfies \[ P(X^*\in B, X^{(s)}\in A) =\int_{C_m} 1_A(x)1_B(x')\psi_m(x, x')dx'dx \] for all \(A, B\). Then the (modified) acceptance probability is \[ \alpha_m(X^*|X^{(s)})=\min \left\{ \frac{\gamma_m(X^{^*}) \psi_m(X^{*}, X^{(s)})}{\gamma_m(X^{(s)}) \psi_m(X^{(s)}, X^{*})}, 1 \right\}. \]
Remark:
Proof: provided in class.
The RJMCMC algorithm is a generalization of the MH algorithm that allows moving across different dimensions of the parameters (models). The see how this algorithm work, we consider a general form of HM algorithm.
Recall the detailed balance condition \[\begin{align} f(x) p\left(x, x^{\prime}\right)=f\left(x^{\prime}\right) p\left(x^{\prime}, x\right) \tag{*} \end{align}\] where \(p\) is the transition density moving from \(x\) to \(x'\). An equivalent form is \[\begin{align*} \int_A \int_{A^{\prime}} f(x) p\left(x, x^{\prime}\right) d x^{\prime} d x=\int_A \int_{A^{\prime}} f\left(x^{\prime}\right) p\left(x^{\prime}, x\right) d x^{\prime} d x, \quad \forall A, A' \end{align*}\] or \[\begin{align} \int_A f(x) P\left(x, A^{\prime}\right) d x=\int_{A^{\prime}} f\left(x^{\prime}\right) P\left(x^{\prime}, A\right) d x^{\prime}, \quad \forall A, A^{\prime} \tag{**} \end{align}\] where \(P(x, A'):=\int_{A'} p(x, x')dx'\). Recall also the classical HM (effective) transition probability using some tentative transition density \(q(x, x')\), takes the form of \[\begin{align*} P\left(x, A^{\prime}\right)=\int_{A^{\prime}} q\left(x, x^{\prime}\right) \alpha\left(x, x^{\prime}\right) d x^{\prime}+ r(x)\delta_A(x), \end{align*}\] where \(r(x):=1-\int \alpha(x, x') q(x, x')dx'\), and \(\alpha(x, x')\) is some acceptance probability.
We then obtain a useful equation that characterizes the feature of the detailed balance condition using \[\begin{align} \int_A f(x) \int_{A^{\prime}} q\left(x, x^{\prime}\right) \alpha\left(x, x^{\prime}\right) d x^{\prime} d x=\int_A f\left(x^{\prime}\right) \int_{A^{\prime}} q\left(x^{\prime}, x\right) \alpha\left(x^{\prime}, x\right) d x^{\prime} d x \tag{***} \end{align}\]
This characterization remains true if we consider some transition between two different spaces \(\mathcal{X}_1\) and \(\mathcal{X}_2\), where \(\mathcal{X}_1 \subset \mathbb{R}^{d_1}\) and \(\mathcal{X}_2 \subset \mathbb{R}^{d_2}\), \(d_1 \neq d_2\).
But the foremost issue is that how to make transition practically across spaces. Since they are in different spaces, obviously, \(q(\cdot, \cdot)\) needs to be state-spaces dependent, that is, we need to have \(q_{1 \to 2}(x, x')\) and \(q_{2 \to 1}(x', x)\). And if such transitions become available, then we can attempt to deduce the acceptance probabilities \(\alpha_{1 \to 2}(x, x')\) and \(\alpha_{2 \to 1}(x', x)\) that balance both sides of the equation (***): \[ \int_A f(x) \int_{A^{\prime}} q_{1 \to 2}\left(x, x^{\prime}\right) \alpha_{1 \to 2}\left(x, x^{\prime}\right) d x^{\prime} d x=\int_A f\left(x^{\prime}\right) \int_{A^{\prime}} q_{2 \to 1}\left(x^{\prime}, x\right) \alpha_{2 \to 1 }\left(x^{\prime}, x\right) d x^{\prime} d x \]
valid reversible mapping:
Suppose we like to move between two spaces \(\mathcal{X}_1 \subset \mathbb{R}^{d_1}\) and \(\mathcal{X}_2 \subset \mathbb{R}^{d_2}\), \(d_1 \neq d_2\), where the whole space is \(\mathcal{X}=\mathcal{X}_1 \cup \mathcal{X}_2.\) Let \(K=\{1, 2\}\) denote the type space (model space).
Suppose we have some probability according to which to move from one space to another space, for example, \(j_{1\to 2} (x)\) is the probability of moving from \(\mathcal{X}_1\) to \(\mathcal{X}_2\) when the current value is \(x\in \mathcal{X}_1.\) More generally, let \(j_{k \to k'}\) denote the probability of moving from space \(k\) to space \(k'\).
For a MCMC algorithm, we’ll have to determine the rule for (tentative) transition density and the acceptance probability. If we remain in the same space, the classical proposal for transition and the acceptance probability would work.
The key is when the move has been determined that it moves between spaces \(k\) and \(k'\) where \(\mathcal{X}_k \neq \mathcal{X}_{k'}.\) The critical step is to establish a valid reversible mapping between \(\mathcal{X}_k\) and \(\mathcal{X}_{k^{\prime}}\).
So we’ll focus for example, on the transition and acceptance rule moving from \(1\) to \(2\).
Remark:
Example 1: Adding or Removing a Parameter (Linear Regression) Suppose we have two nested linear regression models:
Model 1: \[\begin{align*} y=\beta_0+\beta_1 x_1+\beta_2 x_2+\varepsilon, \quad \boldsymbol{\beta}=\left(\beta_1, \beta_2\right) \in \mathbb{R}^2 . \end{align*}\]
Model 2: The same model, but with an additional regressor \(x_3\) : \[\begin{align*} y=\beta_0+\beta_1 x_1+\beta_2 x_2+\beta_3 x_3+\varepsilon, \quad \boldsymbol{\beta}^{\prime}=\left(\beta_1, \beta_2, \beta_3\right) \in \mathbb{R}^3 . \end{align*}\]
Thus, \(d_1=2\) and \(d_2=3\).
Example 2: Birth-Death Move in Mixture Models Suppose \(\mathcal{X}_K\) represents the parameter space for a Gaussian mixture model with \(K\) components. Model parameters can be written as
\[\begin{align*} \boldsymbol{\theta}=\left(\theta_1, \ldots, \theta_K\right) \end{align*}\]
where each \(\theta_k=\left(\mu_k, \sigma_k, \pi_k\right)\) denotes the mean, variance, and mixing proportion of component \(k\). We consider a birth-death pair of moves:
Let the parameter space be \[ \Theta:=\cup_{k}(\{k\} \times \mathcal{X}_k). \]
We define the set of possible (admissible) move types as \(\mathcal{M}\) the collection of \((k, k^{\prime}) \in K \times K\) for which a valid reversible mapping exists between \(\mathcal{X}_k\) and \(\mathcal{X}_{k^{\prime}}\).
Let \(j_{k \rightarrow k^{\prime}}(x)\) denote the probability of proposing a move from space \(k\) to space \(k^{\prime}\), when the current state is \(x \in \mathcal{X}_k\). (see remark for requirement for these these transition probabilities.)
Specifically, each valid move type \(\left(k, k^{\prime}\right) \in \mathcal{M}\) has:
With starting value \(k^{(0)} \in \mathcal{K}\) some finite or countable set, \(X^{(0)} \in \mathcal{X}_{k^{(0)}}\). For \(s=0, 1, 2, \cdots\):
Proof: provided in class.
Remarks:
Let the current model be a finite mixture with \(K\) components. Let the parameter vector be \[\begin{align*} \boldsymbol{\theta}=\left(\theta_1, \ldots, \theta_K\right), \end{align*}\]
where each component parameter \(\theta_k \in \mathcal{\Theta} \subset \mathbb{R}^m\) - recall \(\theta_k=\left(\mu_k, \sigma_k, \pi_k\right)\).
The RJMCMC state is \((K, \boldsymbol{\theta})\), with target density (posterior) \[\begin{align*} \pi(K, \boldsymbol{\theta}) \propto p(y \mid K, \boldsymbol{\theta}) p(\boldsymbol{\theta} \mid K) p(K) . \end{align*}\] We allow moves between models \(K\) and \(K+1\) by birth (add a component) and death (remove a component).
Fix an index \(j \in\{1, \ldots, K+1\}\). We consider the bijection pair the mappings \(h_{K \rightarrow K+1}^{(j)}\) and \(h_{K+1 \rightarrow K}^{(j)}\), where we proposed new component \(w \in \mathcal{\Theta}\), and use \(h_{K \rightarrow K+1}^{(j)}\) to insert \(w\) as the \(j\)-th component. For the reverse move, we use \(h_{K+1 \rightarrow K}^{(j)}\) to remove the \(j\)-th component. (One can show that both functions have Jacobian whose determinant is equal to \(1\) in absolute value).
Birth proposal: \((K, \boldsymbol{\theta}) \rightarrow\left(K+1, \boldsymbol{\theta}^{\prime}\right)\).
At iteration \(s\), current state \((K, \boldsymbol{\theta}^{(s)})\) :
(1). Decide to attempt birth with probability \(b_K:=j_{K \rightarrow K+1}(\boldsymbol{\theta}^{(s)})\), we attempt a birth move (otherwise, for simplicity, we might attempt a death move or some withinmodel move).
(2). Choose insertion index \({j}\). Sample \(j \sim J_{K \rightarrow K+1}(\cdot \mid \boldsymbol{\theta}^{(s)})\) typically \[\begin{align*} J_{K \rightarrow K+1}(j \mid \boldsymbol{\theta}^{(s)})=\frac{1}{K+1}, \quad j=1, \ldots, K+1 . \end{align*}\] (3). Propose new component parameters. Sample \(W^{(s)} \sim g_{K \rightarrow K+1}(\cdot \mid \boldsymbol{\theta}^{(s)})\), e.g. from a prior or a data-informed proposal.
(4). Apply deterministic mapping: set \(\boldsymbol{\theta}^*=h_{K \rightarrow K+1}^{(j)}(\boldsymbol{\theta}^{(s)}, W^{(s)}).\)
The proposed state is \(\left(K+1, \boldsymbol{\theta}^*\right)\). The forward proposal density (conditioned on attempting a birth) is \[\begin{align*} q_{\text {birth }}\left(\left(K+1, \boldsymbol{\theta}^*\right) \mid (K, \boldsymbol{\theta}^{(s)})\right)=b_K J_{K \rightarrow K+1}(j \mid \boldsymbol{\theta}^{(s)}) g_{K \rightarrow K+1}(W^{(s)} \mid \boldsymbol{\theta}^{(s)}), \end{align*}\]
Death proposal: \(\left(K+1, \boldsymbol{\theta}^{\prime}\right) \rightarrow(K, \boldsymbol{\theta})\)
Now suppose the current state is \((K+1, \boldsymbol{\theta}^{(s)})\).
(1). Decide to attempt death with probability \(d_{K+1}:=j_{K+1 \rightarrow K}(\boldsymbol{\theta}^{(s)})\), we attempt a death move.
(2). Choose deletion index \(j\). Sample \(j \sim J_{K+1 \rightarrow K}\left(\cdot \mid \boldsymbol{\theta}^{(s)}\right)\), typically \[\begin{align*} J_{K+1 \rightarrow K}(j \mid \boldsymbol{\theta}^{(s)})=\frac{1}{K+1}, \quad j=1, \ldots, K+1 . \end{align*}\]
(3). Apply deterministic mapping: set \((\boldsymbol{\theta}^*, W^*)=h_{K+1 \rightarrow K}^{(j)}(\boldsymbol{\theta}^{(s)})\), where \(\boldsymbol{\theta}^*=\boldsymbol{\theta}_{-j}^{(s)}, \quad W^*=\boldsymbol{\theta}_j^{(s)}\).
The reverse proposal density is \[\begin{align*} q_{\text {death }}\left(\left(K, \boldsymbol{\theta}^*\right) \mid (K+1, \boldsymbol{\theta}^{(s)})\right)=d_{K+1} J_{K+1 \rightarrow K}(j \mid \boldsymbol{\theta}^{(s)}) \end{align*}\]
If we use equal probabilities for selecting index \(j\), and assume \(K\) is large enough (i.e., we do not need to worry about the boundary case), we can obtain the the acceptance probabilities: \[\begin{align*} \begin{gathered} \alpha_{\text {birth }}=\min \left\{1, \frac{\pi\left(K+1, \boldsymbol{\theta}^*\right) d_{K+1}}{\pi\left(K, \boldsymbol{\theta}^{(s)}\right) b_K g_{K \rightarrow K+1}\left(W^{(s)} \mid \boldsymbol{\theta}^{(s)}\right)}\right\} \\ \alpha_{\text {death }}=\min \left\{1, \frac{\pi\left(K, \boldsymbol{\theta}^*\right) b_K g_{K \rightarrow K+1}\left(W^* \mid \boldsymbol{\theta}^*\right)}{\pi\left(K+1, \boldsymbol{\theta}^{(s)}\right) d_{K+1}}\right\}. \end{gathered} \end{align*}\]
Provided and discussed in lab.
Back Updated: 2025-12-02 Statistical Simulation, Wei Li