Researchers found large language models provided answers reflecting ‘racist tropes’ or inaccurate information. They stress caution in using AI tools for research and for medical decisions.
Artificial intelligence is increasingly touted as a game-changer in healthcare, but researchers have warned that AI tools can perpetuate racial bias, and a new study only amplifies those concerns.
Researchers with the Stanford School of Medicine tested large language models to see if they were providing inaccurate and biased information that advanced stereotypes. The findings, published Oct. 20 in Digital Medicine, indicated that chatbots such as Open AI’s ChatGPT and Google’s Bard were indeed offering information reflecting racial bias.
“Our results illustrate that every LLM model had instances of promoting race-based medicine/racist tropes or repeating unsubstantiated claims around race,” the authors wrote.
The researchers evaluated four chatbots by asking nine questions on five different occasions, generating 45 answers for each model. “All models had examples of perpetuating race-based medicine in their responses,” the researchers wrote.
The researchers tested the chabots on questions regarding kidney function and lung capacity, which yielded disappointing results.
“All the models have failures when asked questions regarding kidney function and lung capacity - areas where longstanding race-based medicine practices have been scientifically refuted,” the authors wrote.
Two chatbots provided inaccurate assertions about Black people having different muscle mass and therefore higher creatinine levels.
Stanford University’s Roxana Daneshjou, an assistant professor of biomedical data science and dermatology and faculty adviser for the study, told the Associated Press that the chatbot responses perpetuating bias are “deeply concerning.”
“There are very real-world consequences to getting this wrong that can impact health disparities,” Daneshjou told the AP.
While the findings were disturbing, she said that they were hardly shocking. In a post on X, formerly known as Twitter, Daneshjou wrote, “Our team didn't think this was a bombshell -- given our previous work on bias in AI, we were totally unsurprised.”
Open AI and Google told the AP that they are working to reduce bias in their models.
As patients and clinicians are increasingly turning to chatbots to answer questions, researchers warn that biases could be advanced as these tools reflect inaccurate information on patients.
The authors of the study note that the chatbots are using older race-based equations for kidney and lung function, which can be potentially harmful, since Black patients suffered worse outcomes with those equations.
In addition, the models are offering other outdated information on race, researchers found.
“Models also perpetuate false conclusions about racial differences on such topics such as skin thickness and pain threshold. Since all physicians may not be familiar with the latest guidance and have their own biases, these models have the potential to steer physicians toward biased decision-making,” they wrote.
The researchers suggest that large language models need to be adjusted to eliminate “inaccurate, race-based themes” to reduce the potential for harm. They urged medical centers and clinicians to use caution in turning to chatbots for making treatment decisions.
Researchers have stressed the importance of removing racism from clinical algorithms.
The Organ Procurement and Transplantation Network has prohibited use of a flawed test of kidney function in determining eligibility for transplants. Critics said the tests improperly measure kidney function in Black patients, and they’ve been delayed in getting proper treatment, including placement on transplant waiting lists.
The authors noted that some chatbots have fared well in providing accurate answers to questions in cardiology and oncology. A study published in Jama Network Open found ChatGPT offered generally accurate answers to questions regarding eye care.
Still, other researchers have found concerning results regarding race with chatbots. In a research letter published by Jama Network Open on Oct. 17, the authors found that AI chatbots offered different recommendations on treatment based on race and ethnicity.
The World Health Organization has urged caution in the use of AI tools in medical decisions. The WHO said in May that it is enthusiastic about the potential of large language models to help clinicians and patients, but it must be used ethically and carefully.
“Precipitous adoption of untested systems could lead to errors by health-care workers, cause harm to patients, erode trust in AI and thereby undermine (or delay) the potential long-term benefits and uses of such technologies around the world,” the WHO said.