Generative Flow Networks (GFlowNets)

Notes from GFlowNet tutorial

A generative model/stochastic policy which can be trained to sample objects \(x\in \mathcal{ X }\) which can be constructed compositionally, in proportion to some non-negative reward \(R(x)\). This sampling property is a result of a special training objective.

The stochastic policy view comes looking at the sequence up to constructing \(x\) as internal actions, which can be thought of as plans, explanations, thoughts etc.

An object \(x\) is sampled by consecutive actions \((a_0,a_1,\cdots)\) which are chosen based on a neural network, these actions are terminated depending on some deterministic function of \(x\) e.g. \(x\) is a set has exactly \(n\) elements. The partially constructed objects \((s_0,s_1,\cdots)\) represent states, and form a trajectory \(\tau\). Note that many trajectories can lead to the same state.

The reward \(R\) is an internal quantity (e.g. energy function from a world model), hence the quality of training GFlowNets depend on the availability of compute to query \(R\) on many trajectories rather than some external dataset. Hence, underfitting is a possible problem.

GFlowNets can also define a backward sampling procedure, i.e. given some \(x\), sample a plausible trajectory that could have constructed it.

Denote \(P_F(s_{t+1}|s_t)\) as the forward sampling policy and \(P_B(s_t|s_{t+1})\) as the backward sampling policy.

The transitions lead to new states, i.e. \(s_{t+1}=T(s_t,a_t)\), where \(T\) depends on the application. Since the state has varying size (e.g. graphs with increasing nodes), the neural network used should be able to accommodate for this (e.g. RNN, GNN, Transformer).