
AI chatbot answers on health questions indicate racial bias
Researchers found large language models provided answers reflecting ‘racist tropes’ or inaccurate information. They stress caution in using AI tools for research and for medical decisions.
Artificial intelligence is increasingly touted as a game-changer in healthcare, but researchers have warned that AI tools can perpetuate racial bias, and a new study only amplifies those concerns.
Researchers with the Stanford School of Medicine tested large language models to see if they were providing inaccurate and biased information that advanced stereotypes. The
“Our results illustrate that every LLM model had instances of promoting race-based medicine/racist tropes or repeating unsubstantiated claims around race,” the authors wrote.
The researchers evaluated four chatbots by asking nine questions on five different occasions, generating 45 answers for each model. “All models had examples of perpetuating race-based medicine in their responses,” the researchers wrote.
The researchers tested the chabots on questions regarding kidney function and lung capacity, which yielded disappointing results.
“All the models have failures when asked questions regarding kidney function and lung capacity - areas where longstanding race-based medicine practices have been scientifically refuted,” the authors wrote.
Two chatbots provided inaccurate assertions about Black people having different muscle mass and therefore higher creatinine levels.
Stanford University’s Roxana Daneshjou, an assistant professor of biomedical data science and dermatology and faculty adviser for the study, told the
“There are very real-world consequences to getting this wrong that can impact health disparities,” Daneshjou told the AP.
While the findings were disturbing, she said that they were hardly shocking. In a
Open AI and Google told the AP that they are working to reduce bias in their models.
As patients and clinicians are increasingly turning to chatbots to answer questions, researchers warn that biases could be advanced as these tools reflect inaccurate information on patients.
The authors of the study note that the chatbots are using older race-based equations for kidney and lung function, which can be potentially harmful, since Black patients suffered worse outcomes with those equations.
In addition, the models are offering other outdated information on race, researchers found.
“Models also perpetuate false conclusions about racial differences on such topics such as skin thickness and pain threshold. Since all physicians may not be familiar with the latest guidance and have their own biases, these models have the potential to steer physicians toward biased decision-making,” they wrote.
The researchers suggest that large language models need to be adjusted to eliminate “inaccurate, race-based themes” to reduce the potential for harm. They urged medical centers and clinicians to use caution in turning to chatbots for making treatment decisions.
Researchers have stressed the importance of
The Organ Procurement and Transplantation Network has prohibited use of a flawed test of kidney function in determining eligibility for transplants. Critics said the tests improperly measure kidney function in Black patients, and they’ve been delayed in getting proper treatment, including placement on transplant waiting lists.
The authors noted that some chatbots have fared well in providing accurate answers to questions in cardiology and oncology. A study published in Jama Network Open found ChatGPT offered
Still, other researchers have found concerning results regarding race with chatbots. In a
The World Health Organization has urged
“Precipitous adoption of untested systems could lead to errors by health-care workers, cause harm to patients, erode trust in AI and thereby undermine (or delay) the potential long-term benefits and uses of such technologies around the world,” the WHO said.
Read more:

















































