• Research

    Researchers Improve AI Models Used in Drug Discovery

New Computational Benchmark Helps the Design and Selection of Drug Discovery Al Models

Drug repurposing to expand treatment options for patients leaps forward with an advance in machine learning from Mayo Clinic researchers and collaborators. Led by the lab of Nansu Zong, Ph.D., the team has created the gold-standard computational benchmark in which other drug-target testing models can be evaluated, compared and standardized.

Drugs go through rigorous testing before they can be used to treat patients. But once in use, clinicians may notice other effects of the drug, or the way the drug works may provide a theory for use in other diseases. For example, a drug first used to treat cancer is now used at a far lower dose to treat autoimmune diseases, such as rheumatoid arthritis. With the development of drug information databases, more substances than ever can be evaluated for expanded use.

But the problem is that the more extensive and diverse the databases of drugs, human responses and potential drug targets, the more complex the model needed to handle it.

The Zong Lab, from left: Shaika Chowdhury, Ph.D., Nansu Zong, Ph.D., and Kritib Bhattarai.

Too Many Options

With millions of symptoms, cellular and molecular cascades, gene information, and drug options, scientists can't experiment to screen every option. Instead, they must focus on particular targets with evidence to show they'll likely work or on certain drugs that seem promising. To widen this screening process, researchers use computers to identify potential links. In most cases, computational laboratories validate their findings using datasets of known interactions or sets that have been curated from multiple databases.

But that is less than ideal. These databases are large and diverse, so using an internal validation set could introduce bias into the findings and lead to inaccuracies that translate to years of cell and animal testing down the tubes. Also, the researchers found that current models can struggle when new data are added or when trying to repurpose a drug for a specific protein.

Model to Test the Model

To solve the problem, researchers led by Dr. Zong, an early career scientist in the Department of Artificial Intelligence and Informatics Research, developed a computational model to test the "fitness" of other drug repurposing models. It incorporates evaluation strategies for models commonly used in this work, paired with almost a million biomedical concepts, 8.5 million associations and 62 million drug-drug and protein-protein similarities. Named "BEnchmark for computational drug TArget prediction," which they shortened to the acronym BETA, the model allows for evaluating different data types and a resulting ability to select the best strategy.

BETA takes into account a range of ways researchers parse data and through a series of tests, BETA can provide a benchmark for models incorporating drug and drug targets as the initial input. Image created in BioRender.

BETA takes into account a range of ways researchers parse data: general; screening for nontarget activity (side effects/new interactions); categorical target and drug screening; specific drug and target search; and drug evaluation for particular diseases. Through a series of tests, BETA can provide a benchmark for models incorporating drug and drug targets as the initial input.

In evaluating nine baseline models provided, BETA was able to clarify which scored better or worse to identify drug repurposing options for a range of diseases, including Alzheimer's disease, cancer (breast, colorectal, leukemia), diabetes, HIV, heart attack and obesity.

Next, the team plans to study how BETA can facilitate drug development for specific diseases, and the current focuses of the team are Alzheimer's disease and cancers. More data sources, such as genetic data and patients’ electronic health records, will be leveraged in future work.

Authors, Affiliations, And to Read More

In addition to Drs. Zong and Liu, authors from Mayo Clinic are Hongfang Liu, Ph.D., the Dr. Richard F. Emslander Professor II; Andrew Wen, Yue Yu, Ph.D.; Ming Huang, Ph.D.; Shaika Chowdhury, Ph.D.; Sunyang Fu, Ph.D.; Richard Weinshilboum, M.D., the Mary Lou and John H. Dasburg Professor of Cancer Genomics Research ; and Guoqian Jiang, M.D., Ph.D.

For a complete list of authors — including from the National Cancer Institute, University of California Davis, Stanford School of Medicine, Auburn University, and the University of Colorado Denver — as well as funding sources and disclosures, see the paper in Briefings in Bioinformatics.

- Sara Tiner, June 3, 2022