Computation of ADME/Tox properties using machine learning can focus and prioritize the use of expensive assays very early in the drug discovery process and minimize late-stage failures of potent lead compounds. Recently, large multi-task networks have been trained using transfer-learning, in which a related task is trained alongside the primary task(s), with the model then gaining predictive performance on the primary task. As an example we have built a deep learning Convolutional Long Short Term Memory (ConvLSTM) model as well as fine-tuned a pre-trained Transformer model (MolBART) on additional datasets (e.g. 42 toxicity datasets). We then created an aggregate dataset with 2400 targets from an extract of from ChEMBL to determine whether the toxicology target molecules with a small number of data points (less than 250 datapoints) would benefit from transfer learning from the initial pre-trained molBART model on many different targets. We will describe our benchmarking with datasets.
Learning Objectives:
Understand what is a large language model and how it can be applied to ADME/Tox datasets to improve model predictions.
Understand pros and cons of large language models
Understand where these technologies may be applied in future
Understand how to implement these models
Have an idea of how large language models can be used for ADME modeling