Discover how these AI models may rely more on memorization than true understanding, reshaping the future of AI in life sciences.
In a groundbreaking development for the world of artificial intelligence (AI), Prof. Dr. Jürgen Bajorath and his team of cheminformatics experts at the University of Bonn have unveiled a method that peels back the layers of AI’s “black box” nature, particularly in its application to pharmaceutical research. Their findings, recently published in Nature Machine Intelligence, challenge conventional notions of how AI systems work in drug discovery.
Traditionally, drug discovery relies on scientific models to predict which molecules can effectively interact with target proteins, often employing AI techniques like Graph Neural Networks (GNNs). These GNNs use graph representations of protein-ligand complexes to make predictions about binding strengths. However, the actual decision-making process within these AI models has remained enigmatic, resembling a “black box.” Prof. Dr. Bajorath’s team took a closer look at six different GNN architectures, using their proprietary “EdgeSHAPer” method, and compared them with a different methodology. The goal was to determine whether GNNs truly learned the essential interactions between compounds and proteins or arrived at predictions through other means.
Their findings were surprising. The GNNs appeared to rely heavily on the data they were trained with, prioritizing chemically similar molecules encountered during training rather than specific protein-drug interactions. In essence, they “remembered” these similarities and based their predictions on them, regardless of the target protein—a phenomenon reminiscent of the “Clever Hans effect.” This discovery has significant implications for drug discovery research. It suggests that GNNs might not be as effective at learning chemical interactions as initially thought. Their predictions may be overrated, as simpler methods and chemical knowledge can yield similar results. However, the researchers see a potential silver lining, as some GNN models displayed an increased ability to learn interactions when dealing with more potent test compounds.
Prof. Bajorath emphasizes that AI is not “black magic” and that the true workings of AI models need to be understood for reliable applications. He believes that their approach, which includes open-access tools like EdgeSHAPer, holds promise for shedding light on the inner workings of AI models. Additionally, they are exploring similar analysis methods for other network architectures, such as language models, to enhance the transparency and explainability of AI in various fields, including the life sciences. The team’s research not only challenges the existing understanding of AI’s role in drug discovery but also paves the way for more transparent and explainable AI in the future, benefiting various domains that rely on machine learning.
Reference: “Learning characteristics of graph neural networks predicting protein–ligand affinities” by Andrea Mastropietro, Giuseppe Pasculli and Jürgen Bajorath, 13 November 2023, Nature Machine Intelligence.