# Adaptive Design in Real Time

## Details

Title : Adaptive Design in Real Time Author(s): Desi Ivanova Link(s) :

## Rough Notes

Talk at the Seminar in Advances in Probabilistic Machine Learning, Aalto University.

New framework for experimental design called Deep Adaptive Design (, , cite:, a) (DAD) first to do real-time adaptive experimentation.

Bayesian Experimental Design (BED) states taht an optimal design is the one that maximizes the Expected Information Gaim (EIG) e.g. delay discounting "Do you prefer \(R\) now or $100 in \(D\) days - the design here is \(R,D\) and the result is Yes/No.

- The Bayesian model consists of the prior \(p(\theta)\), likelihood of the outcome \(p(y_t|\theta,\xi_t,h_{t-1})\) which can be explicit or implicit.
- We want to learn about the parameters of interest, \(\theta\) by performing maximally informative experiments \(\xi\).
- Information gain equals the prior entropy minus the posterior entropy, i.e. \(IG(\xi, y) = H(p(\theta))-H(p(\theta|y,\xi))\) but we need to choose the design to know this, so take the EIG = \(\mathbb{E}_{y|\xi}[IG(\xi,y)] = I(\theta;y|\xi)\)

The experimental design life cycle starts from prior beliefs, then we optimize EIG, then choose the design, then perform the experiment, observe the outcome, and update beliefs. This is done \(T\) times. This process however is extremely time consuming, in fact doubly intractable hence prohibitively expensive to run in real time.

The solution is to use policy-based BED, more abstractly, take past data to get a policy which returns the next design. Policy-based BED has the following life cycle - we do offline training to get a policy from prior beliefs, which is then deployed on the live experiment setting, where a design is chosen and outcome measured \(T\) times. The question is now what the policy is and how to model it.

First, we generalize traditional BED which allows for end-to-end training by introducting the notion of total information - the total EIG ofr a \(T\) step experiment is the sum of the EIGs for each experiment iteration. This objective itself is still doubly intractable. DAD optimizes a lower bound, assuming an explicit likelihood. Implicit DAD (iDAD) optimizes a lower bound using samples only.

Both DAD and iDAD ammortize the cost of BOED by using a neural network for the policy, called the polic network which takes as input designs and observations from previous stages, and outputs the design suggestion for the next experiment.

DAD is a million times faster and results in 20% higher information gain.