Stanford's Jonathan H. Chen on Predictive AI, Medicine, and Hype

One of the co-authors of NEJM article on inflated machine learning expectations spoke to HCA News about what inspired his commentary.

In June, a pair of Stanford doctors and researchers wrote a well-read piece for the New England Journal of Medicine entitled “Machine Learning and Prediction in Medicine — Beyond the Peak of Inflated Expectations.” Jonathan H. Chen, MD, PhD, and Steven M. Asch, MD, MPH, argued that “whether such artificial-intelligence systems are “smarter” than human practitioners makes for a stimulating debate — but is largely irrelevant,” and that healthcare should “move past the hype cycle and on to the “slope of enlightenment,” where we use every information and data resource to consistently improve our collective health.”

Healthcare Analytics News recently reached out to Dr. Chen (left) to speak further on the motivations for the commentary and the nature of AI’s potential in the healthcare space. Chen has a unique perspective, as both a physician and an incoming Assistant Professor of Medicine in the Stanford Center for Biomedical Informatics Research. Despite the cautionary tone of the article, he is by no means down on that potential: rather, he worries that such hype could prove a detriment to the field’s bright future.

Can you briefly, in your own terms, define what you mean when you say “AI,” because so many people have different understandings here.

In the broader philosophical sense, AI could be intelligent machines or computers that exhibit behavior that simulates what we consider intelligent behavior, like a human.

Nowadays what’s really absorbed that definition, more or less, are machine learning algorithms. That’s what people are calling “AI” these days, even though I think it has a bigger meaning than just that. I’m OK with that for now, but knowing that that’s what people are saying about essentially predictive algorithms…you make the algorithm better by just feeding it data, and then it just absorbs the data to improve what it knows. Nowadays people are just equating that with AI, even though I think AI is a much bigger superset of that concept.

What inspired you to pen that NEJM article?

What inspired me to write it was just observing people around me. I’m at Stanford, an academic institution and one of the centers for AI in medicine ideas, and I’m hearing very smart people buying into ideas when they don’t quite know what they’re buying into, basically. They’re hearing promises that they’re not going to follow up on, and I don’t think they quite understand what those mean.

What I fear, because it’s happened before in history, is when overhype and promises don’t get realized, there’s a backlash afterwards. It happened with AI in the 80’s, if you look at the AI Winter, and even with stem cells and the Human Genome Project.

I actually think AI has huge potential for medicine, that’s exactly the type of stuff that I’m working on. If there’s a backlash because people overhype what’s possible too early on, it becomes harder to invest in the longer-term work because people shy away if they feel like they didn’t get what was promised at first. People just don’t have the right expectations at times.

A line that stuck out was “Even a perfectly calibrated prediction model may not translate into better clinical care.” Can you elaborate a bit on that?

I think one of the first inspirations for the article was actually an email exchange I was having about a recent paper. Others were like ‘oh look, they put up a new risk score where they can predict which kids are going to die in the ICU, isn’t that so cool and valuable.’ I was like, ‘it’s cool, but is that actually going to save any kids’ lives?’ That’s a very different question.

Machine learning, data analytics, they’re very good at predicting things. But just because you can predict something doesn’t mean you can get how to change it. Presumably, you predicted somebody dying because you wanted to find out how to avoid them dying. It doesn’t tell you if it’s even possible to change that outcome. I predict that the sun is going to come up tomorrow, I’m pretty sure that’s accurate, but there’s nothing I can do about it. Many of the predictions that come out of systems can be highly accurate, but their accuracy may be based on things that were dumb obvious to a doctor anyway, and it still doesn’t do anything to tell you how to change the outcome.

If you wanted a contrite way to say it, really, correlation is not causation, and everybody knows that. I don’t think they knew how to translate that idiom as it relates to clinical prediction.

Beyond the hype and also beyond the very near future, what actionable contributions do you see ultimately coming from it?

Well, “ultimately” is a long time. Within the foreseeable future, there’s clinical prediction. We already do that, maybe not as specifically, but we have prediction scores for deaths in the ICU, or who is likely to have a blood clot in their lungs, whose liver is likely to die, for example. What’s cool about AI and machine learning algorithms nowadays is that they can generate these kinds of things, it’s almost commodity and those are almost trivial to make, whereas before it took years to pull all these things together for a very basic risk score. Now you can put these things together very rapidly with incremental effort given all the automation that’s possible with large data sets.

But with risk triage patients, it still doesn’t tell you at a very individual level ‘this patient is going to have cancer 7 years from now.’ That doesn’t make sense to me, people are misinterpreting how to do that. I’ve heard other people say ‘this algorithm can tell you exactly what day you’re going to die.’ Like, definitely not, the math just won’t work. It’s not that we haven’t tried hard enough, it’s mathematically that that’s not possible.

If you need to risk triage, as in, ‘this top 10% of patients have a very high risk of having a problem, and the bottom 10% have a very low risk so don’t worry about them…’ that’s very doable at the broad system level and we can make better use of scarce healthcare resources. That’s very foreseeable and achievable, we can do a better job than leaving humans to guess on their own. We’re not bad, but we’re better when backed up by these data-driven systems.

In the nearer term, image recognition is one example of an application, that’s the hottest area right now and is showing promise already. I could easily see that within 10 years it will be almost standard practice to incorporate some automated image recognition in routine practices.

Do things like the MD Anderson/IBM Watson falling out, and IBM's relatively underwhelming growth numbers, create a backlash on the technology?

I actually like what IBM Watson is doing, the vision of where they’re going is a great vision and I really like what they’re trying to do.

In some sense, this backlash is already starting to happen if we think of the MD Andersen fallout, which is another part of why I wrote the piece I did. Call it a market correction or something, where we’re seeing that there’s great potential and the vision is awesome, but it doesn’t mean it’s easy, and there will be missteps along the way, of course. It really exposed what’s true in most of these cases.

The analytics, the data, that’s important, but that’s almost the easy step once you’ve got it. The real challenge is the last mile of implementation. That’s where all the hard work is in actually making a difference. We can crunch data all day, and we do, we find the next algorithm that can predict this or predict that, and I can publish all sorts of research papers on it and I’ll keep my job going. If we want to improve patient outcomes and effect clinical care, that’s a whole institution, a culture, it’s a complicated health system. Those are the changes that are much more complex to address, and have a lot more to do with people and societal changes, rather than simply technical ones.

Having said that, if we want to make a difference in patients’ lives, that is where a lot of attention needs to go to.

What’s in that last mile?

There’s a million things, but even if we break it down simply: let’s say you come up with an algorithm that predicts what’s the best treatment for a patient, or who is at higher risk of death or something. Who is looking at that information and what are they supposed to do with it? Algorithm predicts for me, the doctor, that my patient has a high risk of death. OK, so what do I do? I think the patient needs to get some other type of treatment option, so how do I tell the patient that, how do I tell the nurse that, how do I get them to actually take the medication, how do I follow up and make sure our intervention had the result we intended to, how do we monitor the patients afterwards and get the outcomes to continue to optimize the system?

Depending on who you talk to, these are a lot less sexy and a lot less fun, but if we actually care about making a difference, that’s where it happens. The analytics are a cool new technology that gives us a capability, but that’s a relatively small piece in a much more complex system.

Predicting high risk of readmission to a hospital, as it applies to people who are homeless, or have schizophrenia, sure you can predict right that they’ll be expensive when they show up to the hospital, but what do you do about it?

If you were prioritizing factors that might end up holding AI back from its full medical potential, what would be at the top?

The main holdup is that it’s hard to share the data. I’m fortunate that I have access to most of Stanford’s clinical data, but what are they doing at Kaiser? What are they doing at Columbia? For good privacy reasons I can’t get ahold of it, but it’s hard to think broadly. Data access and sharing I think is the major holdup in terms of making advances that we potentially could.

But as with the UK and Google DeepMind example, there’s reasons why it’s hard to share this data too.