Why a machine-learning scientist says the trouble stems from a lack of quality data.
Anyone who regularly reads the major medical journals or attends healthcare’s biggest conferences is likely aware of one major, almost universally accepted premise: Healthcare is on the precipice of a big data revolution.
As a postdoctoral fellow specializing in healthcare at Toronto’s Vector Institute for Artificial Intelligence, Kiret Dhindsa, Ph.D., has a clear understanding of the potential of machine learning to transform healthcare. But he also knows the on-the-ground reality.
>> LISTEN: Is AI Real?
“As a machine-learning scientist wanting to work with health data, the picture that is painted in all of these editorials is nothing like what I experience,” he said.
The problem? Big data isn’t good enough. It also must be good data.
Right now, Dhindsa told me, the massive amount of healthcare data being generated is riddled with inconsistencies and inaccuracies. Sometimes, even if the data are good, they’re not the right kind of information.
Take, for instance, medical imaging. Many doctors write diagnostic notes directly on the images they take. Given a set of such images, a machine-learning algorithm might learn to be highly accurate in analyzing the images. But what happens in the real world where not every doctor makes such annotations and there’s little consistency in annotations among different physicians? The short answer: failure.
“One of the first things you discover when you work with multi-institutional data sets is that it is usually easier for a machine-learning algorithm to identify which hospital a sample came from than whether the sample is from a healthy or sick individual,” he said. “That should make clear the extent of the problem.”
Dhindsa, along with colleagues Mohit Bhandari, M.D., Ph.D., and Ranil Sonnadara, Ph.D., M.S., both of McMaster University, recently published an editorial in The BMJ making the case that the big data “revolution” won’t happen in healthcare until we find ways to standardize health data collection across institutions and improve the overall quality of the data.
This is, of course, a problem that many in healthcare acknowledge — but one that the industry has yet to solve.
Dhindsa said he was moved to write the piece when he began focusing his research on healthcare. He kept finding articles written five and 10 years ago announcing the advent of the revolution, yet no such transformation had taken place. And while there have been some exciting demonstrations of the power of machine learning and artificial intelligence in healthcare, those are the exception, not the rule, Dhindsa said.
It’s not just about bad data. It’s also a lack of the right kind of data. Dhindsa said many people expect machine-learning algorithms to be able to read an MRI and determine whether the patient is sick. The problem: Such a determination would require a large number of MRIs of healthy people, so that the algorithm could understand what a healthy MRI looks like. For obvious reasons, healthy people don’t often undergo MRIs.
“So what data should we use to tell an ML algorithm what being ‘healthy’ looks like?” Dhindsa said.
Dhindsa said the premise of his editorial is widely accepted within the machine-learning community. Many of his colleagues have said they’re happy someone finally came out and said what they had all been thinking.
“What’s happening is that our voices are being drowned out by researchers who are not actually at the intersection of machine learning and healthcare, because they paint an exciting vision for the future and don’t really talk about why the challenges in the field don’t simply boil down to overcoming technical problems,” he said.
Dhindsa’s editorial doesn’t reject that machine learning can affect healthcare. Rather, it’s a call to arms, asking those in the healthcare and machine learning communities to work together to devise and adopt consistent standards around healthcare data. There’s also the potential for artificial intelligence to help solve the problem of bad data, by analyzing existing data and transforming it into a standardized format. That won’t be a holistic solution, Dhindsa said, but rather something of a temporary fix that will buy healthcare institutions time to restructure their data management systems.
“I suspect that in the meantime, data scientists will end up throwing out huge amounts of data due to quality issues,” he said.
The editorial has received mixed reactions. Some have been happy to see these concerns out in the open. Others have been in touch to report successes with ML and AI. Others have pointed to the piece as evidence that AI will never play a major role in healthcare.
Dhindsa said he disagrees with that position. He’s still confident that ML and AI have a major role in the future of healthcare. He just wants people to have a realistic expectation of what needs to happen for that to finally become a reality.
“I think a lot of people, especially the clinical community, are just not aware of why this big data revolution is going to be much slower than expected,” he said.
Get the best insights in healthcare analytics directly to your inbox.