The algorithm can be adopted by other hospitals and health systems to identify incident stroke.
A machine-learning algorithm performs well for identifying incident stroke and for determining type of stroke.
The algorithm’s performance in a general population sample demonstrated its generalizability and potential to be adopted by other hospitals and health systems.
Nicholas Larson, Ph.D., and colleagues developed a machine learning-based phenotyping algorithm for incident stroke ascertainment based on diagnosis codes, procedure codes, and clinical concepts extracted from clinical notes using natural language processing. The predictive modeling study used observational cohort data for training and validation. An atrial fibrillation (A-fib) cohort was used to train and test the phenotyping algorithm for the date of incident stroke events. The generalizability of the algorithm was evaluated in a general population cohort.
A patient population from Minnesota made up the A-fib cohort. All healthcare-related events were extracted through the Rochester Epidemiology Project. Data included demographic information, diagnostic and procedure codes, healthcare utilization data, outpatient drug prescriptions, results of laboratory tests, and information about smoking, height, weight, and body mass index.
The algorithm aimed to identify first stroke events within a certain time frame. The team used three major data elements: clinical concepts, ICD-9 codes, and CPT codes. Different models were constructed by varying the inclusion of CPT codes and symptom-related clinical concepts in the model feature set and compared different models’ performances. Clinical concepts were identified from the major and secondary problem list in the Mayo Clinic EHR and from clinical notes from other Rochester Epidemiology Project sites using a natural language processing system.
Larson and the investigators created a data set with 9,130 confirmed visits with stroke and nonstroke labels among 1,773 patients. There were 746 stroke visits and 8,384 nonstroke visits. They included data from a randomly selected 79.98% of screened patients as a training set and the other 20.02% were retained as an independent testing set.
Phenotype models were trained using logistic regression and random forest. The team evaluated the generalizability of the model on a sample from a general population cohort of more than 71,000 patients. Those included were at least 30 years old with no prior history of cardiovascular disease. The best performing model was applied to the entire population cohort to generate incident stroke predictions. Then, 50 patients were randomly selected from those who had no stroke-related features, 50 patients were selected from those who were shown to have negative stroke predictions, and 50 patients were selected from those who were shown to have positive stroke predictions and a predicted incident stroke for evaluation.
Overall, of 4,914 patients with A-fib, 740 had validated incident stroke events. The best performing algorithm used clinical concepts, diagnosis codes, and procedure codes as features in a random forest classifier.
Among those with stroke codes in the general population sample, the best-performing model had a positive predictive value of 86% (95% CI, .74-.93) and a negative predictive value of 96%. For subtype identification, the team achieved an accuracy of 83% in the A-fib cohort and 80% in the general population sample.
The findings demonstrated incorporating structured EHR data can effectively distinguish incident stroke mentions from historical events in the clinical notes. Based on the performance of the AI among the general population cohort, the algorithm could be adopted by other institutions.
The study, “Natural Language Processing and Machine Learning for Identifying Incident Stroke From Electronic Health Records: Algorithm Development and Validation,” was published online in the Journal of Medical Internet Research.