2024

Machine learning-aided generative molecular design
Machine learning-aided generative molecular design

Yuanqi Du, Arian R. Jamasb, Jeff Guo, Tianfan Fu, Charles Harris, Yingheng Wang, Chenru Duan, Pietro Liò, Philippe Schwaller, Tom L. Blundell

Nature Machine Intelligence 2024

Here we provide a comprehensive overview of the current state of the art in molecular design using machine learning models as well as important design decisions, such as the choice of molecular representations, generative methods and optimization strategies. Subsequently, we present a collection of practical applications in which the reviewed methodologies have been experimentally validated, encompassing both academic and industrial efforts. Finally, we draw attention to the theoretical, computational and empirical challenges in deploying generative machine learning and highlight future opportunities to better align such approaches to achieve realistic drug discovery end points.

Machine learning-aided generative molecular design
Machine learning-aided generative molecular design

Yuanqi Du, Arian R. Jamasb, Jeff Guo, Tianfan Fu, Charles Harris, Yingheng Wang, Chenru Duan, Pietro Liò, Philippe Schwaller, Tom L. Blundell

Nature Machine Intelligence 2024

Here we provide a comprehensive overview of the current state of the art in molecular design using machine learning models as well as important design decisions, such as the choice of molecular representations, generative methods and optimization strategies. Subsequently, we present a collection of practical applications in which the reviewed methodologies have been experimentally validated, encompassing both academic and industrial efforts. Finally, we draw attention to the theoretical, computational and empirical challenges in deploying generative machine learning and highlight future opportunities to better align such approaches to achieve realistic drug discovery end points.

RNA-FrameFlow: Flow Matching for de novo 3D RNA Backbone Design
RNA-FrameFlow: Flow Matching for de novo 3D RNA Backbone Design

Rishabh Anand, Chaitanya Joshi, Alex Morehead, Arian R. Jamasb, Charles Harris, Simon Mathis, Keiran Didi, Bryan Hooi, Pietro Liò

ICML 2024 AI for Science Workshop 2024 Spotlight

We introduce RNA-FrameFlow, the first generative model for 3D RNA backbone design. We build upon $SE(3)$ flow matching for protein backbone generation and establish protocols for data preparation and evaluation to address unique challenges posed by RNA modeling.

RNA-FrameFlow: Flow Matching for de novo 3D RNA Backbone Design
RNA-FrameFlow: Flow Matching for de novo 3D RNA Backbone Design

Rishabh Anand, Chaitanya Joshi, Alex Morehead, Arian R. Jamasb, Charles Harris, Simon Mathis, Keiran Didi, Bryan Hooi, Pietro Liò

ICML 2024 AI for Science Workshop 2024 Spotlight

We introduce RNA-FrameFlow, the first generative model for 3D RNA backbone design. We build upon $SE(3)$ flow matching for protein backbone generation and establish protocols for data preparation and evaluation to address unique challenges posed by RNA modeling.

SynFlowNet: Towards Molecule Design with Guaranteed Synthesis Pathways
SynFlowNet: Towards Molecule Design with Guaranteed Synthesis Pathways

Miruna Cretu, Charles Harris, Julien Roy, Emmanuel Bengio, Pietro Liò

GEM Bio Workshop @ ICLR 2024

This work introduces SynFlowNet, a GFlowNet model whose action space uses chemically validated reactions and reactants to sequentially build new molecules. We evaluate our approach using synthetic accessibility scores and an independent retrosynthesis tool. SynFlowNet consistently samples synthetically feasible molecules, while still being able to find diverse and high-utility candidates.

SynFlowNet: Towards Molecule Design with Guaranteed Synthesis Pathways
SynFlowNet: Towards Molecule Design with Guaranteed Synthesis Pathways

Miruna Cretu, Charles Harris, Julien Roy, Emmanuel Bengio, Pietro Liò

GEM Bio Workshop @ ICLR 2024

This work introduces SynFlowNet, a GFlowNet model whose action space uses chemically validated reactions and reactants to sequentially build new molecules. We evaluate our approach using synthetic accessibility scores and an independent retrosynthesis tool. SynFlowNet consistently samples synthetically feasible molecules, while still being able to find diverse and high-utility candidates.

Evaluating Representation Learning on the Protein Structure Universe
Evaluating Representation Learning on the Protein Structure Universe

Arian Jamasb, Alex Morehead, Chaitanya Joshi, Zuobai Zhang, Kieran Didi, Simon Mathis, Charles Harris, Tian Tang, Jianlin Cheng, Pietro Lió, Tom Blundell

ICLR 2024

In the benchmark, we implement numerous featurisation schemes, datasets for self-supervised pre-training and downstream evaluation, pre-training tasks, and auxiliary tasks. The benchmark can be used as a working template for a protein representation learning research project, a library of drop-in components for use in your projects, or as a CLI tool for quickly running protein representation learning evaluation and pre-training configurations.

Evaluating Representation Learning on the Protein Structure Universe
Evaluating Representation Learning on the Protein Structure Universe

Arian Jamasb, Alex Morehead, Chaitanya Joshi, Zuobai Zhang, Kieran Didi, Simon Mathis, Charles Harris, Tian Tang, Jianlin Cheng, Pietro Lió, Tom Blundell

ICLR 2024

In the benchmark, we implement numerous featurisation schemes, datasets for self-supervised pre-training and downstream evaluation, pre-training tasks, and auxiliary tasks. The benchmark can be used as a working template for a protein representation learning research project, a library of drop-in components for use in your projects, or as a CLI tool for quickly running protein representation learning evaluation and pre-training configurations.

2023

PoseCheck: Generative Models for 3D Structure-based Drug Design Produce Unrealistic Poses
PoseCheck: Generative Models for 3D Structure-based Drug Design Produce Unrealistic Poses

Charles Harris, Kieran Didi, Arian Jamasb, Chaitanya Joshi, Simon Mathis, Pietro Liò, Tom Blundell

MLSB Workshop @ NeurIPS 2023 Spotlight

This work introduced PoseCheck, an extensive analysis of multiple state-of-the-art methods and find that generated molecules have significantly more physical violations and fewer key interactions compared to baselines, calling into question the implicit assumption that providing rich 3D structure information improves molecule complementarity. We make recommendations for future research tackling identified failure modes and hope our benchmark will serve as a springboard for future SBDD generative modelling work to have a real-world impact.

PoseCheck: Generative Models for 3D Structure-based Drug Design Produce Unrealistic Poses
PoseCheck: Generative Models for 3D Structure-based Drug Design Produce Unrealistic Poses

Charles Harris, Kieran Didi, Arian Jamasb, Chaitanya Joshi, Simon Mathis, Pietro Liò, Tom Blundell

MLSB Workshop @ NeurIPS 2023 Spotlight

This work introduced PoseCheck, an extensive analysis of multiple state-of-the-art methods and find that generated molecules have significantly more physical violations and fewer key interactions compared to baselines, calling into question the implicit assumption that providing rich 3D structure information improves molecule complementarity. We make recommendations for future research tackling identified failure modes and hope our benchmark will serve as a springboard for future SBDD generative modelling work to have a real-world impact.

Multi-State RNA Design with Geometric Multi-Graph Neural Networks
Multi-State RNA Design with Geometric Multi-Graph Neural Networks

Chaitanya Joshi, Arian Jamasb, Ramon Vinas, Charles Harris, Simon Mathis, Pietro Lió

CompBio Workshop @ ICML 2024

This work introduces gRNAde, a geometric RNA design pipeline operating on 3D RNA backbones to design sequences that explicitly account for structure and dynamics. Under the hood, gRNAde is a multi-state Graph Neural Network that generates candidate RNA sequences conditioned on one or more 3D backbone structures where the identities of the bases are unknown.

Multi-State RNA Design with Geometric Multi-Graph Neural Networks
Multi-State RNA Design with Geometric Multi-Graph Neural Networks

Chaitanya Joshi, Arian Jamasb, Ramon Vinas, Charles Harris, Simon Mathis, Pietro Lió

CompBio Workshop @ ICML 2024

This work introduces gRNAde, a geometric RNA design pipeline operating on 3D RNA backbones to design sequences that explicitly account for structure and dynamics. Under the hood, gRNAde is a multi-state Graph Neural Network that generates candidate RNA sequences conditioned on one or more 3D backbone structures where the identities of the bases are unknown.

DiffHopp
DiffHopp

Jos Torge, Charles Harris, Simon Mathis, Pietro Liò

CompBio Workshop @ ICLR 2023 Spotlight

DiffHopp is an equivariant diffusion model for scaffold hopping.

DiffHopp
DiffHopp

Jos Torge, Charles Harris, Simon Mathis, Pietro Liò

CompBio Workshop @ ICLR 2023 Spotlight

DiffHopp is an equivariant diffusion model for scaffold hopping.

Flexible Small-Molecule Design and Optimization with Equivariant Diffusion Models
Flexible Small-Molecule Design and Optimization with Equivariant Diffusion Models

Charles Harris, Keiran Didi, Arne Schneuing, Yuanqi Du, Arian Jamasb, Michael Bronstein, Bruno Correia, Tom Blundell, Pietro Lió

MLDD @ ICLR 2023

This work extended the use of DiffSBDD to various subtasks in SBDD, such as fragment linking and compounds optimisation.

Flexible Small-Molecule Design and Optimization with Equivariant Diffusion Models
Flexible Small-Molecule Design and Optimization with Equivariant Diffusion Models

Charles Harris, Keiran Didi, Arne Schneuing, Yuanqi Du, Arian Jamasb, Michael Bronstein, Bruno Correia, Tom Blundell, Pietro Lió

MLDD @ ICLR 2023

This work extended the use of DiffSBDD to various subtasks in SBDD, such as fragment linking and compounds optimisation.

2022

DiffSBDD: Structure-based Drug Design with Equivariant Diffusion Models
DiffSBDD: Structure-based Drug Design with Equivariant Diffusion Models

Arne Schneuing*, Charles Harris*, Yuanqi Du*, Arian Jamasb, Ilia Igashov, Weitao Du, Tom Blundell, Pietro Lió, Carla Gomes, Max Welling, Michael Bronstein, Bruno Correia (* equal contribution)

Accpeted at Nature Computational Science (not yet in print)MLSB Workshop @ NeurIPS 2024

DiffSBDD was one of the first equivariant diffusion models for structure-based drug design.

DiffSBDD: Structure-based Drug Design with Equivariant Diffusion Models
DiffSBDD: Structure-based Drug Design with Equivariant Diffusion Models

Arne Schneuing*, Charles Harris*, Yuanqi Du*, Arian Jamasb, Ilia Igashov, Weitao Du, Tom Blundell, Pietro Lió, Carla Gomes, Max Welling, Michael Bronstein, Bruno Correia (* equal contribution)

Accpeted at Nature Computational Science (not yet in print)MLSB Workshop @ NeurIPS 2024

DiffSBDD was one of the first equivariant diffusion models for structure-based drug design.

Graphein - a Python Library for Geometric Deep Learning and Network Analysis on Protein Structures and Interaction Networks
Graphein - a Python Library for Geometric Deep Learning and Network Analysis on Protein Structures and Interaction Networks

Arian Jamasb, Ramon Vinas Torne, Eric Ma, Charles Harris, Ilia Igashov, Kevin Huang, Dominic Hall, Pietro Lió, Tom Blundell

NeurIPS 2022

Graphien is an easy to use package to turn structural biology data into machine learning ready data formats, notably for graph neural networks.

Graphein - a Python Library for Geometric Deep Learning and Network Analysis on Protein Structures and Interaction Networks
Graphein - a Python Library for Geometric Deep Learning and Network Analysis on Protein Structures and Interaction Networks

Arian Jamasb, Ramon Vinas Torne, Eric Ma, Charles Harris, Ilia Igashov, Kevin Huang, Dominic Hall, Pietro Lió, Tom Blundell

NeurIPS 2022

Graphien is an easy to use package to turn structural biology data into machine learning ready data formats, notably for graph neural networks.