The Underrated Challenges of Building a Learning Model

January 19, 2018

Article

"You need a mandate from the institutional leadership down that we’re going to makes this data accessible and usable to make progress."

Nothing about machine learning is necessarily simple, but some aspects may be more difficult than some in healthcare might think.

Mark Michalski, the Executive Director of the MGH & BWH Center for Clinical Data Science (CCDS)—a joint projection between Massachusetts General Hospital and Brigham & Women’s Hospital—ran the crowd at the AI in Healthcare Summit today in Boston through the more grueling parts of the artificial intelligence (AI) model design process.

As applied statistics turn to machine learning and into deep learning and neural networks, the data demands become greater, Michalski said. Neural networks require extensive annotated data, with the optimal word being “annotated.” A lot of outside data scientists might see the sheer volume of data that the healthcare industry possesses and think “If only I could get my hands on that, I could…” the speaker said, but what they don’t realize is that the majority is unstructured and poorly annotated.

“You need a mandate from the institutional leadership down that we’re going to makes this data accessible and usable to make progress,” according to Michalski, if healthcare is to fully leverage new innovations in computing.

CCDS has an inordinate amount of data, including over 5 billion images and records from millions of examinations. Its executive director laid out the process of crunching all of that down to a usable training set for a learning system.

After laying out the theoretical underpinnings of the model, which can be 10-20% of the total work and “a pain that no one wants to do,” software has to be built. From there, a cohort must be assembled and whittled down into a usable form.

First, a team must find all the patients in its repository that are representative of the cohort using data like clinical information and radiology reports. Then all the reports must be curated, using data like exam and billing codes, before some machine learning can be applied to find patterns and identify the usable portion. The process can start with millions records and produce a training set of 1,000.

And once it exists, it needs to be put into a form that the end user—say, a radiologist—can easily use it. A testing environment highly representative of their typical workflow so they can deliver numerous free screenings to adjust and validate the model. That, Michalski said, is an “underappreciated and often ignored part of the process.”

The bright side is, once an organization becomes adept at curation and validation design, the algorithmic architecture can be a much easier process. The actual model can be swapped out if the infrastructure behind it is well-built.

In a large stroke mapping study, the model was one of the easier parts of the process. “The architecture was actually 80% commodity,” Michalski said. “We were able to just pull it from the internet and tweak it.”

Another underappreciated aspect of AI for healthcare, he said, is how the definitions have blurred as hype has built.

“AI means a lot of things to a lot of people. Data science might be more specific-it’s applied statistics at scale. Machine learning is not the same as deep learning, and there’s a lot of tools within it,” he said. “The term has reached this level of fervor that you really have to understand what they’re actually talking about.”

That’s important for CCDS, because they have to have the conversation often. Boston is teeming with AI startups and other leading health systems glad to collaborate: Early in his speech, he joked that a new AI company would have been started before he finished talking.

Recent Videos

Image: Ron Southwick, Chief Healthcare Executive

Image: Ron Southwick, Chief Healthcare Executive

Image: Ron Southwick, Chief Healthcare Executive

Image: Ron Southwick, Chief Healthcare Executive

Image: Ron Southwick, Chief Healthcare Executive

Image: Ron Southwick, Chief Healthcare Executive

Image: Ron Southwick, Chief Healthcare Executive

Image: Ron Southwick, Chief Healthcare Executive

Image: Ron Southwick, Chief Healthcare Executive

Related Content

Image: Deloitte

Most healthcare finance leaders say they’re worried about business conditions

They’re concerned about uncertainty in federal policy, tariffs, and the economy at large, according to a new Deloitte report. Dr. Jay Bhatt of Deloitte talks about the new report.

Iodine Software CEO talks about AI, hospitals, and demand for results | Data Book podcast

Iodine Software CEO talks about AI, hospitals, and demand for results | Data Book podcast

William Chan, the co-founder of the healthcare technology company, discusses artificial intelligence in the latest episode of our podcast.

Image: Fortified Health Security

Cybersecurity challenges in health care remain daunting | Viewpoint

Most hospitals and health systems are still struggling to find the cybersecurity budget and staffing to succeed.

MJH Life Sciences

Using voice technology to connect with patients | Data Book podcast

In the latest episode of Chief Healthcare Executive’s podcast, we talk with Freddie Feldman of Wolters Kluwer Health about patient engagement and helping patients get the care they need.

Image: Ron Southwick, Chief Healthcare Executive

At HLTH this year, AI will take center stage

The health technology conference in October will be spotlighting artificial intelligence. Rich Scarfo, president of HLTH, says the focus reflects the growing importance of AI in the industry.

Image credit: ©Proxima Studio - stock.adobe.com

TEFCA has moved beyond hype, but there’s room for progress in exchanging health data

The Trusted Exchange Framework and Common Agreement went live in 2023, and while full interoperability hasn’t happened, some see important gains.

Related Content

Image: Deloitte

Most healthcare finance leaders say they’re worried about business conditions

They’re concerned about uncertainty in federal policy, tariffs, and the economy at large, according to a new Deloitte report. Dr. Jay Bhatt of Deloitte talks about the new report.

Iodine Software CEO talks about AI, hospitals, and demand for results | Data Book podcast

Iodine Software CEO talks about AI, hospitals, and demand for results | Data Book podcast

William Chan, the co-founder of the healthcare technology company, discusses artificial intelligence in the latest episode of our podcast.

Image: Fortified Health Security

Cybersecurity challenges in health care remain daunting | Viewpoint

Most hospitals and health systems are still struggling to find the cybersecurity budget and staffing to succeed.

MJH Life Sciences

Using voice technology to connect with patients | Data Book podcast

In the latest episode of Chief Healthcare Executive’s podcast, we talk with Freddie Feldman of Wolters Kluwer Health about patient engagement and helping patients get the care they need.

Image: Ron Southwick, Chief Healthcare Executive

At HLTH this year, AI will take center stage

The health technology conference in October will be spotlighting artificial intelligence. Rich Scarfo, president of HLTH, says the focus reflects the growing importance of AI in the industry.

Image credit: ©Proxima Studio - stock.adobe.com

TEFCA has moved beyond hype, but there’s room for progress in exchanging health data

The Trusted Exchange Framework and Common Agreement went live in 2023, and while full interoperability hasn’t happened, some see important gains.

Terms and Conditions

Do Not Sell My Personal Information

Contact Info

2 Commerce Drive
Cranbury, NJ 08512

© 2025 MJH Life Sciences

All rights reserved.