Demystifying Big Data and Machine Learning for Healthcare

In a new commentary, a pair of Harvard informaticists argue that there's "never a specific threshold wherein a model suddenly becomes machine learning."

In the hype-happy world of health tech, a lot of terms are bandied about pretty loosely, and "machine learning" is certainly one of them. A new JAMA commentary by 2 Harvard informaticists tries to demystify the concept by placing various health-related data tools on a spectrum.

“Machine learning is not a magic device that can spin data into gold, though many news releases would imply that it can,” Andrew L. Beam, PhD, and Isaac S. Kohane, MD, PhD write slyly (and correctly). “Instead, it is a natural extension to traditional statistical models.”

In that light, they consider the oft-invoked Framingham Risk Score to be a simple sort of machine learning system. Because it is heavily guided by human input, it would fall on the primitive side of that spectrum, a mere “risk calculator.” But still, since the score was developed based on a proportional hazards model of over 5,300 patients, “the rule was in fact learned entirely from the data.”

The authors write that greater volumes of data and less human supervision push a tool to the more complex side of their spectrum, and closer to “black box” status. Series of algorithms like those used to detect diabetic retinopathy from images find themselves on the right of their graph, classified as “deep learning”…which sometimes gets mixed up with AI.

Kohane and Beam even include heuristics and rules of thumb in their machine learning spectrum—albeit, rudimentary versions. “There is never a specific threshold wherein a model suddenly becomes machine learning,” they write.

The commentary comes as neither warning nor endorsement. They stress how important machine learning will increasingly become for healthcare, especially given the explosion of health data sources. There is a real possibility of “black box” algorithms that learn so well that humans don’t know quite how they’re doing it, and the technology carries “no guarantees of fairness, equitability, or even veracity.”

The complexity shouldn’t scare clinicians away from machine learning, they seem to say. Instead, as the algorithms take more and more control in medicine, healthcare professionals must be careful to ensure quality where they can: in the data.

“We are reluctant to repeat the cliché,” they write. But still, they do: “Garbage in…”

Related Coverage:

Marc Berger Shares the Keys to the Data Mart

Need Your Docs to Adopt New Tech? First, Find Out What Motivates Them

A New Ethical Wrinkle for Medical Algorithms