# Retro*: Learning Retrosynthetic Planning with Neural Guided A* Search - 2020

## Details

Title : Retro*: Learning Retrosynthetic Planning with Neural Guided A* Search Author(s): Machine Learning Center at Georgia Tech Link(s) : https://www.youtube.com/watch?v=nyP5V_LHUvw

## Rough Notes

The paper presents the **optimal retrosynthetic planning problem**, and an algorithm like A* search that learns from experience. Work has SOTA performance on a real-world benchmark dataset.

The retrosynthesis problem is to plan a sequence of reactions that lead to the synthesis of a target molecule from a set of building blocks. The combinatorial nature of this problem makes it hard, and thus it is broken down into 2 subproblems:

- One-step retrosynthesis.
- Multi-step retrosynthetic planning.

In one-step retrosynthesis, given a target molecule, the task is to predict the reactants which are the precursors of synthesizing the product. This work assumes that a model \(B\) exists for this, where \(B\) takes a target molecule and a reaction for \(k\) candidate reactions, each with a predicted reactance and predicted cost.

This leads to the problem of retrosynthesis planning, where we plan i.e. choose actions from reaction candidates produced from \(B\).

Previous approaches use methods such as Monte-Carlo Tree Search (MCTS) and Proof Number Search. However, multiple solutions can exist and it is hard to distinguish between them, thus this work tries to take solution quality into account. Some criteria for better action routes:

- Shorter routes with higher yields.
- More chemically sound.
- More economically efficient.
- More environmentally friendly.
- etc.

This work assumes that any criteria can be summarized by a predicted cost value of the reactions.

The **optimal retrosynthesis planning problem** is now defined as:

- Given a target molecule \(t\), and a set of building block molecules \(M\), and a 1-step model \(B\), find the sequence of reactions \((R_1,\cdots,R_k)^*\) such that \((R_1,\cdots,R_k)^* = \text{argmin}_{(R_1,\cdots,R_n)_{n\in \mathbb{ N }}} \sum_{i=1}^{n}\text{Cost}(R_i)\), where the sequence of reactions are predicted with \(B\) and start with molecules in \(M\) and lead to the synthesis of \(t\). A practical constraint here is that calls to \(B\) should be limited.

The method presented in this paper, Retro*, is the first algorithm to learn from previous planning experience. It depends on the AND-OR tree representation, where:

- OR nodes are molecules requiting at least 1 of its children to be ready (i.e. synthesizable).
- AND notes are reactions requiring all children to be ready.

The root of the tree is the target molecule.

Retro* involves:

- Selection, pick a frontier node with the best value \(V_t(m|T)\) under the current search tree \(T\), the cost of the current best plan containing \(m\) and synthesizes target molecule \(t\).
- Expansion, expand the note with an AND-OR stump.
- Update, propagate values to the related nodes.

To compute \(V_t(m|T)\) efficiently, we can decompose it into simpler components recursively.

- Boundary case: Cost of synthesizing frontier node \(m\), \(V_m = V_m(m|\emptyset)\), assumed to be known. Can be learnt from planning dataset later.
- Define the reaction number \(rn(\cdot|T)\) as the minimum estimated cost need for a molecule/reaction to happen in the current tree. Two recursive formulas for this:
- \(rn(R|T) = c(R) + \sum_{m\in ch(R)}^{}rn(m|T)\)
- \(rn(m|T) = V_m(m|\emptyset)\) if \(m\) is in the frontier set (leaf set), else \(\min_{R\in ch(m)}rn(R|T)\).
- Compute \(V_t(m|T)\) using \(rn(\cdot|T)\), summing over the cost of all the ancestor reactions, and the reaction number of all the sibling molecules in all levels of the search tree.

\(V_t(m|T)\) has the same form as the A* algorithm, thus by the A* admissibility property, Retro* is guaranteed to find an optimal solution if \(V_m\) is a lower bound.

The dataset contains tuples of:

- Target molecule \(t_i\).
- Best synthesis cost \(v_i\)
- Expert reaction.
- One-step retrosynthesis candidate \(B(t_i)\)

The training objective consists of:

- Fitting \(V_m(m|\emptyset)\) to best synthesis cost \(v_i\) via least squares.
- Consistency loss to maintain the partial order relationship between the best one-step solution \(i\) and the other solutions in \(B(m_i)\).

Final objective balances these 2 losses.

To create the retrosynthesis dataset, reactions from USPTO are used.