Sequential Experimental Design (SED)
Rough Notes
Optimal policy can be computed via Dynamic Programming (DP) which requires solving the Bellman Equations. Approximations include rollout algorithms, Monte-Carlo Tree Search (MCTS), or myopic approximations,
An optimal batch (non-adaptive) design is an approximation to the optimal sequential (adaptive) design - adaptive & non-adaptive designs are the same in some cases when the data utility does not depend on the observed samples. (#DOUBT Need to check why I wrote data utility instead of utility)
The optimal batch expected utility is a lower bound on the optimal sequential expected utility, and is as tight as the 1-step optimal policy's implied expected utility.