GitHub - hankook/RetCL

RetCL: A Selection-based Approach for Retrosynthesis via Contrastive Learning

Accepted to IJCAI 2021

thumbnail

Prepare dependencies

We use Pytorch for ML framework, RDKit for cheminformatics tool, and DGL for easy implementation of graph neural networks. We highly recommend to use Anaconda for managing package dependencies.

conda create -n RetCL python=3.7
conda activate RetCL
conda install pytorch=1.5.0 cudatoolkit=10.1 -c pytorch
conda install rdkit=2019.09 -c rdkit
conda install dgl=0.4.3 dgllife=0.2.1 -c dglteam

Preprocessing data

python preprocess.py --dataset uspto_50k --datadir data/uspto_50k/
python preprocess.py --dataset uspto_50k --datadir data/uspto_50k_modified/
python preprocess.py --dataset uspto_candidates --datadir data/uspto_candidates/

You can download the USPTO-full dataset from the GLN repository.

Evaluation

We provide pretrained models in checkpoints/.

You can obtain the same results reported in Table 1 using the following scripts:

python evaluate.py --num-layers 5 --use-sum --best 50 --beam 50 --ckpt checkpoints/uspto50k_unknown.pth
python evaluate.py --num-layers 5 --use-sum --best 50 --beam 50 --ckpt checkpoints/uspto50k_given.pth --use-label

To obtain the results in Table 2, replace --best 50 --beam 50 with --best 200 --beam 200.

For the generalization experiment, run the following script:

python evaluate.py --num-layers 5 --use-sum --best 50 --beam 50 --ckpt checkpoints/uspto50k_modified_unknown.pth --datadir data/uspto_50k_modified/ --classwise

Training

You can train our RetCL framework using train.py. For example, one can learn with 4 nearest neighbors and sum pooling using the following script:

python train.py --use-sum --num-neighbors 4 --logdir logs/uspto_50k_unknown_N4