Like Self-Driving Cars, Healthcare Must Solve Its Machine Learning Problem

October 27, 2017

Article

Why a new debugger called DeepXplore could prepare artificial intelligence for the real world.

deepxplore,columbia,lehigh,machine learning,hca news

^{DeepXplore, a new debugging tool, found that the darker photo fooled a set of neurons into commanding a self-driving car to turn right. DeepXplore then retrained the network. Credit: Columbia Engineering}

When Suman Jana, PhD, and his colleagues saw companies trumpeting the capabilities of their deep learning neural networks, they noticed something was missing. The artificial intelligence (AI) was reaping strong results but only in curated data sets. How would their machine learning fare against messier data?

The question was important because curated data isn’t as common in the real world—where deep learning neural networks matter most, fueling self-driving cars, predictive policing, and, of course, medical diagnostics. “If the system makes a mistake,” Jana told Healthcare Analytics News™, “the effects can be deadly.” So the Columbia University assistant professor and computer scientist resolved, along with several researchers from his department and Lehigh University, to attack the problem.

Together, they developed a debugging tool called DeepXplore, which feeds real-world inputs into a network, clustering neurons to spotlight “rare” but potentially consequential mistakes. Essentially, Jana said, the instrument analyzes a number of decisions made by different parts of the network. If they match, everything probably works smoothly. If one decision fluctuates from the rest, it’s probably an error. And outside the lab, that error could kill.

“Over the last 10 years, machine learning has made an almost magical jump. Nothing worked. Now, everything seems to work,” Jana said. “But it still requires some more work before it can be deployed and trusted.”

Other researchers have created deep learning neural network debuggers. DeepXplore, however, hits a “sweet spot” that has eluded others, Jana said. The tool is systematic, meaning it checks everything. It can also be scaled to test sprawling networks comprising millions of neurons. To this point, similar debuggers have captured only part of that formula, he said.

DeepXplore, set to be unveiled next week in Shanghai, came about with self-driving cars in mind. But its uses and implications go far beyond that—and into healthcare.

Take deep learning neural networks’ role in medical diagnostics. Google made waves last year when it published a study in the Journal of the American Medical Association showing how a deep learning algorithm could detect diabetic retinopathy in patient photographs. Jana praised the breakthrough but cautioned that the data set included good images. In the future, the technology could encounter photos that are blurry or flawed in some other way. That, in turn, could trip up the AI.

“It’s unclear how those kinds of deep neural networks will be able to handle those images,” Jana added. “So DeepXplore will test and see if it’s making the right diagnostics.” Correctly calling those shots is critical. “The side effect of such a mistake,” he warned, “is pretty bad.”

Unlike manually-generated random or adversarial testing, DeepXplore is capable of pinpointing a wide variety of bugs because it uses the network to examine conflicting decisions made by neuron clusters in the face of an imperfect image. According to Columbia, the researchers simulate “real-world conditions” by altering the photos to imitate, say, dust on a camera lens or a shot partially blocked by a person. In the case of self-driving cars, the shade of a photo might cause the AI to turn left when it should have veered right.

So far, according to the study, Jana and his team have tested their software on 15 neural networks, including Nvidia’s Dave 2 network, which powers self-driving cars. The technology has found “thousands of bugs missed by previous techniques,” according to an announcement from Columbia. They have activated as much as 100% of network neurons, an average of 30% more than random and adversarial testing. On some networks, DeepXplore boosted overall accuracy to 99%, an average improvement of 3%.

The software is 1 mechanism in a broader set that stands to foster public trust in the technology. AI has been proven to be as good, if not better than, its human counterparts, Jana said. But since machine learning lives in a “black box,” most people remain skeptical. “Together, with some sort of explanation, I think this tool can go a long way in terms of convincing people to trust the systems, at least as well as your trust your own doctor,” he added.

One thing that DeepXplore can’t do? Certify that a neural network is bug-free, a process that requires isolating and scrutinizing the rules learned by a given network, according to Columbia. ReluPlex, a new technology from Stanford University, can accomplish this on small networks. A co-developer of that technology praised DeepXplore, which can complement ReluPlex.

As of now, DeepXplore focuses only on images. Jana said his group hopes to expand its capabilities to include machine translation, chat bots, speech recognition, and more. They also want to provide better guarantees and even locate ethical violations in AI.

“We plan to keep improving DeepXplore to open the black box and make machine learning systems more reliable transparent,” Kexin Pei, a Columbia grad student and co-developer, said. “As more decision-making is turned over to machines, we need to make sure we can test their logic so that outcomes are accurate and fair.”

And, as co-developer and Lehigh computer scientist Yinzhi Cao said, their ultimate goal is to test a system and be able to firmly tell its creators whether it’s safe. That, Jana said, is what will sow greater trust in AI’s ability to perform substantial tasks, like delivering diagnoses.