Better Training of GFlowNets with Local Credit and Incomplete Trajectories - 2023

Details

Title : Better Training of GFlowNets with Local Credit and Incomplete Trajectories Author(s): Pan, Ling and Malkin, Nikolay and Zhang, Dinghuai and Bengio, Yoshua Link(s) :

Rough Notes

This work addresses the limitation of GFlowNets where learning objectives only learn reward from a terminal state at the end of the trajectory.

Recall that GFlowNets learn a stochastic policy \(P_F\), specified as a distribution over children of nonterminal states in the flow network \(G\). When sampling from a posterior, if \(P_F^T(x)\) is the marginal likelihood that a complete trajectory \(s_0\to s_1 \to \cdots s_n\) sampled from \(P_F\) has terminal state \(s_n=x\in \mathcal{X}\), we want \(P_F^T(x)\propto R(x)\).

\(P_T^F(x) = \sum_{\tau=(s_0\to s_1\to \cdots \s_n)}^{} P_F(\tau)\sum_{\tau}^{}\prod_{i=1}^{n}P_F(s_i|s_{i-1})\)

Existing learning objectives include:

  • Detailed balance.
  • Trajectory balance.
  • Subtrajectory balance.

Proposed method: First, assume the terminal state energy (reward) function can be extended to the set of all states, not just the terminal states. Assume an additive decomposition, then we have for any state \(s_t\), \(\mathcal{ E }=\sum_{i}^{}\mathcal{ E }(s_{i-1}\to s_i)\)

#TODO Continue from Section 4.2.

Emacs 29.4 (Org mode 9.6.15)