How Colossal Data Sets (and a Billion Tweets) Can Help Track Flu Outbreaks

Ryan Black

Researchers at UChicago liken an influenza outbreak to a wildfire, and say that social connectivity is the wind that stokes it.

Researchers at the University of Chicago (UChicago) say that big data and social media patterns enabled them to develop a high-resolution picture of how influenza outbreaks travel through the United States— “perhaps even higher than what the Centers for Disease Control and Prevention can see,” according to Andrey Rzhetsky, PhD.

Rzhetsky and colleagues at UChicago, used data from Truven MarketScan’s trove of more than 40 million de-identified family health records, they first analyzed each flu outbreak from 2003 to 2011. They found that the source tends to be the American South, particularly near the Gulf of Mexico or the Atlantic Ocean, and the disease “streams” northward and wider from there.

The study likens a flu outbreak to a wildfire, and social connections are the tinder by which it spreads. To learn about those, the study also analyzed geo-located Twitter messages—1.7 billion of them—to better understand how patients travel each week. Patterns of movement between counties act as the wind that helps the fire spread, and understanding them could help inform public health messaging.

"For example, if flu-like symptoms are being reported in one county, you could tell people in neighboring counties to stay away from crowds, or you could focus vaccination efforts in certain places in advance," Rzhetsky said in a statement. "It could be used essentially as a weather forecast for the flu."

Between their information sources, the researchers said they used longitudinal data from over 150 million people. It allowed them to recreate 3 years’ worth of flu outbreaks “fairly accurately” and to determine and rank the strongest predictors of how influenza will spread. In order from most to least important, they were:

  • Host population's socio- and ethno-demographic properties
  • Weather variables pertaining to specific humidity, temperature, and solar radiation
  • The virus' antigenic drift over time
  • The host population’s land-based travel habits
  • Recent spatio-temporal dynamics

The team’s work was partially funded by Defense Advanced Research Projects Agency (DARPA) and the National Institutes of Health. The government has good reason to seek a higher-definition picture of how the flu spreads: The disease contributes to tens of thousands of deaths and hundreds of thousands of hospitalizations each year, according to CDC, and costs the government and broader economy billions.

One of the many promises that data analysis could hold for healthcare lies in its ability to predict disease outbreaks. Another study, published in August, 2017, used Center for Medicare and Medicaid Services data to develop an algorithm for weekly flu test and diagnosis volumes, achieving close to 90% predictive accuracy. Other recent work has focused on how internet of things devices could be used to track disease outbreaks worldwide.

This season, the flu outbreak has received press coverage for its severity. But some experts believe it’s in line with other recent years.

“The 2014-15 season was just as bad, if not worse,” National Institute of Allergy and Infectious Diseases Director Anthony Fauci, MD, told MD Magazine in January. “Though the perception has been ‘Wow, this is unprecedented,’ in no way is it unprecedented.”

The new UChicago study, “Conjunction of factors triggering waves of seasonal influenza,” was published this week in eLife.

Related Coverage:

State-Level Influenza 'Nowcasting' Remains a Thorny Issue

How IoT Can Help Detect and Control Infectious Disease Outbreaks in Real-Time

An Algorithm for Flu Outbreak Detection