At the ARVO meeting in Hawaii, Australian researchers presented impressive findings from a massive dataset.
Artificial intelligence (AI) is speeding ahead unevenly in healthcare: In some use cases, it has proven surprisingly potent, while other applications haven’t yet lived up to lofty promises. One major area where AI has broken through has been diagnostic image recognition, particularly for diabetic retinopathy (DR).
At this year’s Association for Research in Vision and Ophthalmology (ARVO) meeting in Hawaii, a team from the Center for Eye Research Australia (CERA) presented a study on a deep learning diabetic retinopathy detection system that produced surprising diagnostic accuracy.
Stuart Keel, PhD, a post-doctorate research fellow at CERA, explained the project. First, the team recruited 21 ophthalmologists based on some strict inclusion criteria that required they grade a subset of 200 images according to the NHS DR screening classification system. Each image was first assigned to an individual grader, then sequentially assigned to other individual graders until 3 consistent grading outcomes were achieved.
“This was assigned as the gold standard grading for each particular image,” Keel said. The results were used to create a training dataset for the deep learning system and a smaller internal validation set. The researchers also validated the algorithm in an external independent dataset to rule out overfitting of the algorithm and ensure generalizability.
Then, they collected over 35,000 images from 3 population-based studies: the Natonal Indigenous Eye Health Survey of Australia, the Singapore-Malay Eye Study, and the AusDiab Study. The different sets represented a range of ethnicities, which Keel said can often lead to false positives. “We know there’s quite a strong variation in fundus pigmentation, which is a potential source of error for these deep learning algorithms.”
The team developed 4 deep learning models: 1 for referable DR, 1 for diabetic macular edema (DME), 1 for classification for image quality as gradable or ungradable, and 1 to identify image gradation as a macula or disc-centered image. When entered into the algorithm, the images first undergo preprocessing for normalization, and are then sequentially filtered through probability distributions for DR.
The area under curve for both DR and DME ran above 0.9, and the total combined dataset generated specificity metrics above 90% and an area under curve of .95. Those results gave Keel and his colleagues “confidence that the area under curve was quite generalizable to other datasets, different ethnicities and different imaging protocols,” the researcher said. Traditional disease regions were highlighted in 97% of cases, which Keel said spoke to the power and accuracy of the deep learning algorithm.
“Also, there’s great potential to provide greater accessibility of DR screening,” Keel added. “Particularly in those low resource areas such as developing nations, regional remote areas, and in particular countries and minority populations.”