AI + Medicine Newsletter
Posts
ChatGPT can predict molecular properties and design new molecules

ChatGPT can predict molecular properties and design new molecules

Encode Box
February 13, 2024

ChatGPT has already demonstrated mind-blowing capabilities in natural language understanding and processing. Its power can be greatly enhanced further to perform more complex tasks using prompt engineering, domain-specific fine-tuning, retrieval-augmented generation (RAG), and collaborative agents.

Can ChatGPT understand chemistry? A paper from Nature Machine Intelligence demonstrates that domain-specific fine-tuning of GPT-3 can perform classification and regression tasks for predicting molecular properties, such as solid-solution formation and Henry's coefficient. It can even design new molecules based on instructions and specified properties. Traditionally, people have built complex ML or AI-based QSAR models for these tasks. If this new approach works, it will fundamentally change the approach in chemical and material sciences, and perhaps in all branches of science.

The fine-tuned model can perform comparably to, or even outperform, conventional machine learning techniques in a few datasets tested in the paper, particularly when the training dataset has a small sample size. It would be interesting to see how broadly this approach can be applied to various scenarios in predictive chemistry and how accurate.

Structure prediction of protein-ligand complex

Iambic Therapeutics published its deep generative model, NeuralPLexer, on Nature Machine Intelligence for predicting protein–ligand complex structure directly from protein sequence and ligand SMILES input. The code is available on GitHub.

At the same time, they also published a whitepaper of the next generation of NeuralPLexer, NeuralPLexer2. It seems the accuracy has increased a lot. Its built-in confidence estimation and higher prediction speed are also valuable for large-scale ligand screening.

Google DeepMind is also working on protein–ligand complex structure prediction using the next generation of AlphaFold. Without directly comparing the two models, the NeuralPLexer2 white paper simply adds the 73.6% accuracy number from Google’s blog post. The accuracy of NeuralPLexer2 is lower without the pLDDT filtering but higher when an estimated binding site is provided (which could be considered cherry-picking and questionable).

ChatGPT can predict molecular properties and design new molecules

Structure prediction of protein-ligand complex

Featured papers

Reply