# Importance Sampling

Let \(X\) be a random variable whose distribution \(\pi\) we call the target distribution, and \(\varphi\) a function, called the target function. Importance sampling is used to compute integrals of the form:

\[ \mathbb{E}[\varphi(x)] = \int \varphi(x)\pi(x) \: dx\]

In general it is assumed that generating samples from \(\pi\) is hard, so let \(\pi_p\) be a proposal distribution which has positive support over \(\pi\) and we can easily generate samples from. One idea to compute the expectation above is to rewrite the integral above as:
\[
\mathbb{E}[\varphi(x)] = \int \varphi(x)\frac{\pi(x)}{\pi_p(x)}\pi_p(x) \: dx = \int \varphi(x)w(x)\pi_p(x) \: dx \]
Since we can sample from \(\pi_p\) we can compute a Monte Carlo estimate, the **importance sampling estimator**, of the above as \(\mathbb{E}[\varphi(x)] \frac{1}{N}\approx \sum_{n=1}^N w(x_n)\varphi(x_n)\) where \(x_n\) are sampled from \(\pi_p\) and \(w(x_n) = \frac{\pi(x_n)}{\pi_p(x_n)}\) are called the **importance weights**.

Above, we assumed that we can evaluate the target distribution \(\pi\) exactly, however in practice we may only be able to evaluate an unnormalized density \(\tilde{\pi}\) such that \(\pi(x) = \frac{\tilde{\pi}(x)}{Z_\pi} = \frac{\tilde{\pi}(x)}{\int \tilde{\pi}(x)\: dx}\). The resulting estimator is called the **self-normalized importance sampling estimator**. By substitution, we see that we have:
\[
\mathbb{E}[\varphi(x)] = \int \varphi(x)\pi(x) \: dx = \int \frac{\varphi(x)\tilde{\pi}(x)}{\int \tilde{\pi}(x)\: dx} \: dx = \frac{\int \varphi(x)\tilde{\pi}(x)\:dx}{\int \tilde{\pi}(x)\: dx} = \frac{\int \varphi(x)\frac{\tilde{\pi}(x)}{\pi_p(x)}\pi_p(x) \:dx}{\int \frac{\tilde{\pi}(x)}{\pi_p(x)}\pi_p(x)\: dx} \]
And the final Monte Carlo estimate is given as \(\mathbb{E}[\varphi(x)] \approx \frac{\frac{1}{N}\sum_{n=1}^N w(x_n)\varphi(x_n)}{\frac{1}{N}\sum_{n=1}^N w(x_n)}\) where \(x_n\) are sampled from \(\pi_p\) and \(w(x_n) = \frac{\tilde{\pi}(x_n)}{\pi_p(x_n)}\) are called the **unnormalized weights**. Unlike the direct version, the self-normalized estimator is biased, but consistent i.e. tends to the true value as \(N\rightarrow \infty\). Note that the self-normalized estimator approximates \(Z_\pi = \int \tilde{\pi}(x) \: dx \approx \frac{1}{N}\sum_{n=1}^N w(x_n)\) as a byproduct.