But few studies compare the models and professionals using the same sample.
Photo/Thumb have been modified. Courtesy of Siarhei - stock.adobe.com.
Artificial intelligence (AI) detects diseases from images with similar accuracy as healthcare professionals, according to a systematic review and meta-analysis of literature published in the journal Lancet Digital Health.
But few deep-learning studies compare the performance of deep-learning models and healthcare professionals using the same sample, researchers found.
“If researchers cannot all agree on what it means to agree, how can we know if model A is better than human B?” Tessa Cook, M.D., Ph.D., assistant professor of radiology at the Perelman School of Medicine at the University of Pennsylvania wrote in a commentary.
Whether AI can be effectively compared to a human physician working in the real world is questionable, Cook wrote.
Researchers reviewed more than 20,500 articles, but less than 1% were robust in their design and reported that independent reviewers had high confidence in the findings, said Alastair Denniston, Ph.D., professor at University Hospitals Birmingham National Health Services Foundation Trust in the U.K.
Only 25 studies validated the AI models externally using medical images from a different population. Just 14 studies actually compared the performance of the technology and provider using the same sample, Denniston added.
“Within those handful of high-quality studies, we found that deep learning could indeed detect diseases ranging from cancers to eye diseases as accurately as health professionals,” he said. “But it’s important to note that AI did not substantially out-perform human diagnosis.”
The research team conducted a systematic review and meta-analysis of all the studies comparing the performance of AI models and providers in detecting diseases from medical imaging published between Jan. 2012 and June 2019. Researchers included 82 articles in the review.
Investigators analyzed 69 articles which contained enough data to accurately calculate performance.
The meta-analysis included 25 articles that validated the results in an independent subset of images.
After analyzing data from 14 studies comparing the performance in the same sample, researchers found that at best, deep learning algorithms can correctly detect disease in 87% of cases. In comparison, healthcare professionals achieved an 86% in correctly detecting disease.
Deep-learning algorithms also slightly outperformed healthcare professionals in accurately excluding patients who don’t have diseases. The technology had a 93% specificity, while providers had a specificity of 91%.
The accuracy of deep learning is similar to that of healthcare professionals, the researchers concluded.
But Cook believes this result could be misconstrued as machine diagnosis being better than that of a professional.
“Why have a human doctor when a digital one would be just as good, maybe better?” she questioned. “Perhaps the better conclusion is that, in the narrow public body of work comparing AI with human physicians, AI is no worse than humans, but the data are sparse and it might be too soon to tell.”
Get the best insights in digital health directly to your inbox.