BINOCULARS for Efficient, Nonmyopic Sequential Experimental Design - 2020

Details

Title : BINOCULARS for Efficient, Nonmyopic Sequential Experimental Design Author(s): Jiang, Shali and Chai, Henry and Gonzalez, Javier and Garnett, Roman Link(s) : https://proceedings.mlr.press/v119/jiang20b.html

Rough Notes

An approximate solution to finite-horizon Sequential Experimental Design (SED), whose optimal policy requires solving the Bellman Equations, which is in general intractable.

The proposed method, BINOCULARS, is non-myopic. The general idea to compute a 1-step optimal batch (i.e. non-adaptive) of experiments then select a single point from this batch. The optimal batch expected utility is shown to be a lower bound on the optimal sequential expected utility.

Let \(T\) be the finite horizon, \(\mathcal{X}\) the design space and \(\mathcal{Y}\) the response space, \(p(y|x,\mathcal{D})\) a probability model, \(u(\mathcal{D})\) some utility, with \(\mathcal{D}\) being the observed data. \(u(y|x,\mathcal{D})=u(\mathcal{D}\cup (x,y))-u(\mathcal{D})\) is the marginal utility gain after observing \(y\) from experiment \(x\).

\(Q_k(x|\mathcal{D})\) is the expected utility of designing experiment \(x\) after observing \(\mathcal{D}\) with \(k\) steps remaining, assuming all later decisions are optimal, which in Bellman equation terms is: \[ Q_k(x|\mathcal{D}) = \mathbb{E}_y[u(y|x,\mathcal{D})] + \mathbb{E}_y[\max_{x'}Q_{k-1}(x'|\mathcal{D}\cup \{(x,y)\})] \] Where the expectation is with respect to \(p(y|x,\mathcal{D})\).

The optimal expected-case policy is then \(x^*=\text{argmax}_x Q_{T-x}(x|\mathcal{D}_i)\) where \(\mathcal{D}_i\) is the observed data at iteration \(i\). This is intractable.

To approximate this, suppose we do \(T\) experiments \(X=\{x_1,\cdots,x_T\}\) at the same time, the expected marginal utility of the observations \(Y=\{y_1,\cdots,y_T\}\) is then \(Q(X|\mathcal{D})=\mathbb{E}_Y[u(Y|X,\mathcal{D})]\), the expectation taken w.r.t \(p(Y|X,\mathcal{D})\).

\(Q(X|\mathcal{D})\) can be rewritten as \(\mathbb{E}_{y_j}[u(y_j|x_j,\mathcal{D})]+\mathbb{E}_{y_j}[Q(X_{-j}|\mathcal{D}\cup\{(x_j,y_j)\})]\).

If we denote \(X^* \in \text{argmax}_X Q(X|\mathcal{D})\) to be an optimal batch, then for any \(x_j^*\in X^*\) the last term in the above decomposition given \(x^*_j\) is \(\max_{X_{-j}}\mathbb{E}_{y^*_j}[Q(X_{-j}|\mathcal{D}\cup \{(x^*_j,y^*_j)\})]\) meaning we can decompose \(Q(X|\mathcal{D})\) i.e. the expected marginal utility (or reward) of the batch as a whole.

Choosing \(x*\in X^*\) is then equivalent to find \(x^*\in \text{argmax}_x B(x|\mathcal{D})\) where \[ B(x|\mathcal{D})=\mathbb{E}_y[u(y|x,\mathcal{D})]+\max_{X'||X'|=T-1}\mathbb{E}_y[Q(X'|\mathcal{D}}\cup\{(x,y)\})] \]

The second term in \(B(x|\mathcal{D})\) is a lower bound on the true expected utility \(\mathbb{E}_y[\max_{x'}Q_{T-1}(x'|\mathcal{D}\cup \{(x,y)\})]\).

The BINOCLARS algorithm is then: Input: Design and response space \(\mathcal{X,Y}\), model \(p(y|x,\mathcal{D})\), utility \(u(y|x,\mathcal{D})\), budget \(T\). Output: \(\mathcal{D}\), a sequence of experiments and observations

for i in 0...T-1:
    Compute optimal batch X_opt of size T-i
    Pick experiment x_opt from X_opt, observed y_opt
    Add x_opt, y_opt to current dataset

Experiments are performed on Bayesian Optimization and Bayesian Quadrature.