The National Institutes of Health says it’s one of the largest data sets ever made public.
In the 24 hours since it went live this week, one of the largest-ever publicly available data sets had racked up dozens of downloads. Each click represented not only a curious researcher, but also a chance to unlock the data’s potential.
Comprising more than 100,000 chest X-ray images, the collection could improve artificial intelligence, diagnoses, and global heath, Ronald M. Summers, who led the effort, told Healthcare Analytics News™. The project came out of a radiology department at the National Institutes of Health’ Clinical Center, where Summers works as a senior investigator in an imaging and computer-aided diagnosis lab.
“Those folks who are trying to us AI for healthcare, they are starved for data sets,” Summers said. “We need really big data sets to train these latest deep-learning systems that are all the rage.”
A decade or 2 ago, impressive data sets consisted of several thousand images, he said. While valuable, groups of that size offer little to AI compared to sets with more than 100,000.
With that sort of bulk, AI can both learn and teach. Summers pointed to two similar endeavors—one on retinal photographs and the other on skin lesions—that broke ground on preventing blindness in people with diabetes and identifying skin cancer, respectively.
Hope for similarly lofty goals exist here.
As academic and research institutions get their hands on the data, they will teach computers to read and process the data, according to the NIH.
Then it may be used to pinpoint slow changes over a series of X-rays, which could otherwise go unnoticed, Summers said. AI may also help patients in developing countries, where the technology is available, but the radiologists who know how to read these images aren’t, he added. The effort could even spur the establishment of a “virtual radiology resident” that might be taught to read other types of images down the road.
It took more than a year for Summers and his team to get to this point.
They compiled the X-ray images from more than 30,000 patients, including many with advanced lung disease, at the NIH Clinical Center. Then Summers used natural language processing to extract information from corresponding radiology reports, he said.
Privacy was a big concern. The researchers removed each header, which contain patient information, and then two people manually reviewed every image, Summers said.
“I really needed to feel confident that the data were properly scrubbed,” he said.