Back Updated: 2024-01-21 Statistical Simulation, Wei Li

Markov Chain with continuous state space

We consider Markov chains in the continuous state space, that is, \(S\) = uncountable set.

Transition probability from each possible state \(x\) to each possible set of states \(A\): \[\begin{align*} &P(x_0,A):=P(X^{(n+1)} \in A | X^{(n)}=x_0),\\ \text{CDF}:\ &F(x_1|x_0):=P(X^{(n+1)} \leq x_1 | X^{(n)}=x_0),\\ \text{One-step transition density function}:\ &f(x_0, x_1):= \frac{\partial F(x_1|x_0)}{\partial x_1}= p(X^{(n+1)}=x_1|X^{(n)}=x_0).\\ \end{align*}\]

Transition density

Define one-step transition density:

\(f^{[1]}(x_n,x_{n+1}):=p(x_{n+1}|x_n)=p(X^{(n+1)}=x_{n+1}|X^{(n)}=x_n).\)

Two-step transition density:

\[\begin{align*} f^{[2]}(x_n,x_{n+2}):=p(X^{(n+2)}=x_{n+2}|X^{(n)}=x_n)&= \int p(X^{(n+2)}=x_{n+2}|X^{(n+1)}=x_{n+1},X^{(n)}=x_n) p(X^{(n+1)}=x_{n+1}|X^n=x_n)d x_{n+1}\\ &=\int p(X^{(n+2)}=x_{n+2}|X^{(n+1)}=x_{n+1}) p(X^{(n+1)}=x_{n+1}|X^{(n)}=x_n)d x_{n+1}\\ &=\int f^{[1]}(x_n,x_{n+1}) f^{[1]}(x_{n+1},x_{n+2})d x_{n+1}. \end{align*}\]

Three-step transition density:

\[\begin{align*} f^{[3]}(x_n,x_{n+3}):=p(X^{(n+3)}=x_{n+3}|X^{(n)}=x_n) &= \int p(X^{(n+3)}=x_{n+3}|X^{(n+2)}=x_{n+2}) \cdot p(X^{(n+2)}=x_{n+2}|X^{(n)}=x_n)d x_{n+2}\\ &=\int f^{[1]}(x_{n+2},x_{n+3}) f^{[2]}(x_n,x_{n+2})d x_{n+2}\\ &=\int \int f^{[1]}(x_{n+2},x_{n+3}) f^{[1]}(x_{n+1},x_{n+2}) f^{[1]}(x_n,x_{n+1})d x_{n+1} d x_{n+2}.\\ \end{align*}\]

Arguing as above, we can obtain the m-step transition density:

\(f^{[m]}(x_n,x_{n+m})= \int \cdots \int \prod_{k=n+1}^{n+m}f^{[1]}(x_{k-1},x_k)dx_{n+1} \cdots dx_{n+m-1}.\)

The corresponding m-step transition probability can be written as \(P(X^{(n+m)} \in A| X^{(n)}=x_n)= \int_A f^{[m]}(x_n,x_{n+m})dx_{n+m}.\) More generally, we have the Chapman-Kolmogorov equation: \[ f^{[n+m]}(x_0, x_{n+m})=\int f^{[m]}(x_0, x_{m})f^{[n]}(x_m, x_{n+m})dx_{m}. \]

Some important properties

If some regular (ergodic) M.C. possesses a limiting transition density independent of the initial state, that is \[\lim_{n \to \infty}f^{[n]}(x,v)=g(v),\] then \(g(v)\) is called “stationary/steady-state density” (long-term probability density) of M.C. and it is a solution to steady-state equation:

\[g(v)=\int_{-\infty}^{+\infty}g(w)f(w,v)dw \qquad (*)\] \[(\text{recall discrete space}: \pi_j=\sum_{i=1}^{K} \pi_i p_{ij} \qquad \forall j=1,2,\cdots ,K),\] where \(g(w)\) may be viewed as the start distribution and \(f(w,v)\) is the transition density function. Note that an equivalent expression to \((*)\) is \[\int_A g(v) dv=\int_{-\infty}^{+\infty}g(w)P(w,A)dw \text{, for all set } A, \text{ where } P(w, A):=P(X^{(n+1)} \in A|X^{(n)}=w).\]

A sufficient condition for \(g(x)\) to be a steady-state density is the “detailed balance condition” given by \[g(x)f(x,v)=g(v)f(v,x) \qquad \forall x,v \qquad (**),\] here \(f(x,v)\) be the density function of one-step transition of the M.C. As in the discrete case, it can be shown that any Markov chain satisfying the “detailed balance condition” \((**)\) will have \(g\) as the steady state density, i.e. \((*)\) holds.

Ergodic Theorem

If \((X^{(1)},X^{(2)},\cdots)\) is an ergodic M.C. whose steady-state density is given by \(g\), then for \(n \to \infty\), \[P(X^{(n)} \in A) \to P(X^{(\infty)} \in A) = \int_A g(x)dx.\] In addition, \[\frac{1}{T}\sum_{n=1}^{T}h(X^{(n)}) \to E[h(X^{(\infty)})]=\int h(x)g(x)dx, \text{ for } T\to \infty.\]

Metropolis Algorithm in Continuous Space

We consider a continuous state space, from which we would like to sample a target density \(f(\cdot)\) or a kernel \(\tilde f\). Our goal is to find a transition function \(p(\cdot |\cdot)\) to satisfy the ‘detailed balance conditions’, i.e: for a proposed \(x^*\), \[ p(x^* |x)f(x) =p(x|x^*)f(x^*).\]

If we can find such a \(p\), then the long run distribution of the ergodic markov chain with transition distribution \(p(\cdot | \cdot)\) is \(f(\cdot)\).

Method

Choose an arbitrary transition density \(q(\cdot | \cdot)\), without requring that the detailed balance conditions hold. Then use an acceptance probability on proposed samples as \[\alpha(x^*|x) = \min \Big(1, \frac{q(x|x^*)f(x^*)}{q(x^*|x)f(x)} \Big).\]

The goal of this is to regulate the flow of probabilities between the current state and the proposed state. So given that our current state is \(x\) then the probability of a particular state being proposed and accepted is \(\alpha (x^*|x) q(x^*|x)\). Similarly, we can calculate the probability that any proposal is accepted as \(a(x) :=\int \alpha (x^*|x) q(x^*|x)dx^*\). A proposal is not accepted with probability \(r(x) := 1- a(x)\), in which case we set \(x^* = x\). The (effective) transition density is then

\[p(x^*|x) = \alpha (x^*|x) q(x^*|x) + r(x)\delta_x(x^*),\]

where \(\delta_x(x^*) = 1\) if \(x^*=x\) and 0 otherwise. Then since we are in continuous state space, this can be generalized to

\[P(x, A)=\int_A p(x^*|x)dx^* = \int_A\alpha (x^*|x) q(x^*|x)dx^* + r(x)\delta_A(x),\]

where \(\delta_A(x) = 1\) if \(x\in A\) and 0 otherwise.

We can show that the (effective) transition density indeed satisfies the detailed balance condition.

Proof: provided in class.

Metropolis-Hastings algorithm

  1. Given a target \(f(\cdot)\) and a chosen \(q(\cdot|\cdot)\), generate \(x^*\) from \(q(\cdot| x^{(s)})\).
  2. Compute \[\alpha(x^*|x^{(s)}) = \min \left(1, \frac{q(x^{(s)}|x^*)f(x^*)}{q(x^*|x^{(s)})f(x^{(s)})} \right).\]
  3. Generate \(u\sim \mbox{unif}(0,1)\)
    1. if \(u\leq \alpha(x^*|x^{(s)})\), set \(x^{(s+1)} = x^*\);
    2. if \(u> \alpha(x^*|x^{(s)})\), set \(x^{(s+1)} = x^{(s)}\).

Remark:

A necessary condition for the algorithm to work is \(\mbox{supp}(f)\subseteq \mbox{supp}(q(\cdot|x))\) for all \(x\in \mbox{supp}(f)\). A minimal necessary condition would be \(\mbox{supp}(f)\subseteq \cup_{x\in \mbox{supp}(f)}[\mbox{supp}(q(\cdot|x))].\)

Choice of Transition Density

  1. Classical Choice (Random Walk): Choose \(q(x^*|x) = \tilde q (x^*-x)\) where \(\tilde q\) is symmetric about 0. For example \(x^* = x^{(s)}+\epsilon^{(s)}\), where \(\epsilon^{(s)}\sim \mbox{unif}(-1,1)\) or \(\epsilon^{(s)}\sim N(0,\tau^2)\). Then we would have \(x^*\sim \mbox{unif}(x^{(s)}-1,x^{(s)}+1)\), or \(x^*\sim N(x^{(s)},\tau^2)\) respectively. Then due to the symmetry about 0, \(q(x^*|x) =q(x|x^*)\), so the acceptance probability simplifies to \(\alpha(x^*|x) = \min (1, \frac{f(x^*)}{f(x)})\).

  2. Independent Metropolis Hastings: Choose \(x^*\sim q(\cdot)=q(\cdot|x)\) independent of the current value \(x^{(s)}\). For example \(q\sim N(0,1)\) throughout.

Practicalities

Proposals should not be too bold (standard deviation not too high), however should be large enough to be consequential (standard deviation not too small). Between \(30\%-60\%\) acceptance rate is ideal.

  1. Burn-in

    1. Run algorithm until some iteration \(B\) for which Markov Chain appears to reach stationarity.
    2. Run algorithm \(B'\) more times and keep \(\{x^{B+1},\ldots,x^{B+B'} \}.\)
    3. Use empirical distribution of \(\{x^{B+1},\ldots,x^{B+B'} \}\) to approximate the density \(f\).
  2. Thinning: After burning to discard first B observations, then thin the chain by including every kth draw.

  3. Trace Plot: The plot of the values generated from the metropolis algorithm against the iteration number. Check:

    1. The chain doesn’t get stuck in a certain area of space.
    2. The chain can move relatively fast (across the support).
  4. “ACF” Autocorrelation Function: Calculate the “lag-\(j\) sample covariance” as \[\hat\gamma_j = \frac{1}{T}\sum_{t=j+1}^T(x^{(t)}-\bar x)(x^{(t-j)}-\bar x)\] and the “lag-\(j\) sample autocorrelation” as \(\hat{\rho}_j = \hat\gamma_j/\hat\gamma_0\). Then plotting autocorrelation \(\hat{\rho}_j\) v.s. the lag index (\(j\)), relatively high values would indicate a high degree of correlation between draws and therefore, slow mixing.

Python Code Examples

Provided and discussed in lab.

Back Updated: 2024-01-21 Statistical Simulation, Wei Li