May 15, 2023 — No matter where you look, machine learning applications in artificial intelligence are being harnessed to change the status quo. This is especially true in health care, where technological advances are accelerating drug discovery and identifying potential new cures.
But these advances don’t come without red flags. They’ve also placed a magnifying glass on preventable differences in disease burden, injury, violence, and opportunities to achieve optimal health, all of which disproportionately affect people of color and other underserved communities.
The question at hand is whether AI applications will further widen or help narrow health disparities, especially when it comes to the development of clinical algorithms that doctors use to detect and diagnose disease, predict outcomes, and guide treatment strategies.
“One of the problems that’s been shown in AI in general and in particular for medicine is that these algorithms can be biased, meaning that they perform differently on different groups of people,” said Paul Yi, MD, assistant professor of diagnostic radiology and nuclear medicine at the University of Maryland School of Medicine, and director of the University of Maryland Medical Intelligent Imaging (UM2ii) Center.
“For medicine, to get the wrong diagnosis is literally life or death depending on the situation,” Yi said.
Yi is co-author of a study published last month in the journal Nature Medicine in which he and his colleagues tried to discover if medical imaging datasets used in data science competitions help or hinder the ability to recognize biases in AI models. These contests involve computer scientists and doctors who crowdsource data from around the world, with teams competing to create the best clinical algorithms, many of which are adopted into practice.
The researchers used a popular data science competition site called Kaggle for medical imaging competitions that were held between 2010 and 2022. They then evaluated the datasets to learn whether demographic variables were reported. Finally, they looked at whether the competition included demographic-based performance as part of the evaluation criteria for the algorithms.
Yi said that of the 23 datasets included in the study, “the majority – 61% – did not report any demographic data at all.” Nine competitions reported demographic data (mostly age and sex), and one reported race and ethnicity.
“None of these data science competitions, regardless of whether or not they reported demographics, evaluated these biases, that is, answer accuracy in males vs females, or white vs Black vs Asian patients,” said Yi. The implication? “If we don’t have the demographics then we can’t measure for biases,” he explained.
Algorithmic Hygiene, Checks, and Balances
“To reduce bias in AI, developers, inventors, and researchers of AI-based medical technologies need to consciously prepare for avoiding it by proactively improving the representation of certain populations in their dataset,” said Bertalan Meskó, MD, PhD, director of the Medical Futurist Institute in Budapest, Hungary.
One approach, which Meskó referred to as “algorithmic hygiene,” is similar to one that a group of researchers at Emory University in Atlanta took when they created a racially diverse, granular dataset – the EMory BrEast Imaging Dataset (EMBED) — that consists of 3.4 million screening and diagnostic breast cancer mammography images. Forty-two percent of the 11,910 unique patients represented were self-reported African-American women.
“The fact that our database is diverse is kind of a direct byproduct of our patient population,” said Hari Trivedi, MD, assistant professor in the departments of Radiology and Imaging Sciences and of Biomedical Informatics at Emory University School of Medicine and co-director of the Health Innovation and Translational Informatics (HITI) lab.
“Even now, the vast majority of datasets that are used in deep learning model development don’t have that demographic information included,” said Trivedi. “But it was really important in EMBED and all future datasets we develop to make that information available because without it, it’s impossible to know how and when your model might be biased or that the model that you’re testing may be biased.”
“You can’t just turn a blind eye to it,” he said.
Importantly, bias can be introduced at any point in the AI’s development cycle, not just at the onset.
“Developers could use statistical tests that allow them to detect if the data used to train the algorithm is significantly different from the actual data they encounter in real-life settings,” Meskó said. “This could indicate biases due to the training data.”
Another approach is “de-biasing,” which helps eliminate differences across groups or individuals based on individual attributes. Meskó referenced the IBM open source AI Fairness 360 toolkit, which is a comprehensive set of metrics and algorithms that researchers and developers can access to use to reduce bias in their own datasets and AIs.
Checks and balances are likewise important. For example, that could include “cross-checking the decisions of the algorithms by humans and vice versa. In this way, they can hold each other accountable and help mitigate bias,” Meskó said..
Keeping Humans in the Loop
Speaking of checks and balances, should patients be worried that a machine is replacing a doctor’s judgment or driving possibly dangerous decisions because a critical piece of data is missing?
Trevedi mentioned that AI research guidelines are in development that focus specifically on rules to consider when testing and evaluating models, especially those that are open source. Also, the FDA and Department of Health and Human Services are trying to regulate algorithm development and validation with the goal of improving accuracy, transparency, and fairness.
Like medicine itself, AI is not a one-size-fits-all solution, and perhaps checks and balances, consistent evaluation, and concerted efforts to build diverse, inclusive datasets can address and ultimately help to overcome pervasive health disparities.
At the same time, “I think that we are a long way from entirely removing the human element and not having clinicians involved in the process,” said Kelly Michelson, MD, MPH, director of the Center for Bioethics and Medical Humanities at Northwestern University Feinberg School of Medicine and attending physician at Ann & Robert H. Lurie Children’s Hospital of Chicago.
“There are actually some great opportunities for AI to reduce disparities,” she said, also noting that AI is not simply “this one big thing.”
“AI means a lot of different things in a lot of different places,” says Michelson. “And the way that it is used is different. It’s important to acknowledge that issues around bias and the impact on health disparities are going to be different depending on what kind of AI you’re talking about.”