• Science Saturday: New standards and open access can help natural language processing

futuristic AI (artificial intelligence) image with technology icons and symbols and a person holding an iPad

Clinical notes in medical records are rich sources of data about human health. But tapping them for medical research can be challenging because these data come from various sources — and they all look different.

"There's no standardization in how data is organized and classified across medical records systems," says Sunyang Fu, Ph.D., a Mayo Clinic biomedical informatics researcher.

Even the language people use to talk about health can insert discrepancies in how data are recorded. "If a patient had a recent fall, they might say they 'went down,' 'flipped backwards,' 'tripped on a rug,' or 'hit the back of their head,'" says Dr. Fu. Scientists are learning how to make sense of such widely varied data using natural language processing, known as NLP. NLP is a discipline related to artificial intelligence (AI) that teaches computers how to understand human language. Scientists design NLP algorithms to transform disparate information into structured data in a standardized format that can be analyzed.

"Open collaboration ensures access to these resources for everyone. This fosters research that equitably advances health for all people." ~Hongfang Liu, Ph.D.

Studies that use NLP have demonstrated promise to benefit patients, says Dr. Fu, but there’s a problem. When publishing their NLP research, scientists don’t always share all the "how to" instructions, sometimes because algorithms are protected as intellectual property. This makes it difficult for other scientists to validate or reproduce a study, one of the hallmarks of good science.

Read the rest of the article on the Discovery's Edge blog.


Other Mayo Clinic medical research websites: 

Related articles