A Review of Modern Computational Algorithms for Bayesian Optimal Design - 2016
Details
Title : A Review of Modern Computational Algorithms for Bayesian Optimal Design Author(s): Ryan, Elizabeth G. and Drovandi, Christopher C. and McGree, James M. and Pettitt, Anthony N. Link(s) : https://www.jstor.org/stable/44162464
Rough Notes
Statistical experimental design give rules which experiments to run when there is variation in the outcomes that is not under the control of the experimenter. The hope is to achieve the experimental goals more quickly, which reduces cost of running experiments.
Designs arising from averaging classical design criteria over the parameter space are not Bayesian (Optimal) Experimental Designs (BED), but are psuedo-Bayesian (or on-average/robust designs). The authors propose a fully Bayesian design to be one where the design is computed from a design criterion that is a functional of the posterior distribution.
Lindley 1972 provides a decision theoretic approach to experimental design, which is what BED is based on.
BED requires a utility function \(U(d,\theta,y)\) describing the worth of choosing design \(d \in D\) yielding data \(y \in Y\) with model parameters \(\theta\in \Theta\).
The BED design is then \(d^* = \text{argmax}_{d\in D}\mathbb{E}_{p(\theta,y|d)}[U(d,\theta,y)]=\text{argmax}_{d\in D}\mathbb{E}_{p(\theta|d,y)p(y|d)}[U(d,\theta,y)]\) The integration and optimization problem involved in finding just the BED design above lead to many approximations being used.
(#DOUBT I am assuming by Bayesian utility functions they mean utilities based on the posterior distribution)
Importance sampling is specifically highlighted, where we wish to draw from the posterior \(p(\theta|d,y)\), we instead generate samples from \(g(\theta)\) to get particles \(\{\theta_k,W_k\}_{k=1}^{N_p}\) where \(W_k\) are normalized samples of \(w(\theta_k)=\frac{p(y|d,\theta_k)p(\theta_k)}{g(\theta_k)}\). This is especially helpful since the samples can only be drawn once and re-weighed in each iteration - in BED importance sampling from the prior distribution is common, this however is inefficient when there is a substantial difference between prior and posterior distributions.