Two years after the ResearchKit launch, studies using the platform seem to suffer from the same issues it was meant to address.
Apple’s Spring 2015 Special Event heralded not just Apple Watch, but also ResearchKit. Open source, integrated with the company’s HealthKit suite, and capable of leveraging both the iPhone’s sensory capabilities and its sheer ubiquity, ResearchKit was presented by Apple as a potential revolution.
In its introductory video, figures from prominent research institutions spoke of its capabilities. Kathryn Schmitz, PhD, of the University of Pennsylvania, talked about previous recruitment difficulties, noting that her group had sent out over 60,000 letters and netted just 305 respondents for a study. “The concept that I could kick out a survey to patients every day, every week, that would show up on their phone and would actually improve their health and our ability to care for them...that is a game changer. That is awesome.”
Recruitment was one of 4 major issues with traditional clinical research that Jeff Williams, head of operations for Apple, said ResearchKit was designed to address. Given small sample sizes and a process that frequently requires financial incentivizing, traditional centralized cohorts can be unrepresentative. The next 2 issues were subjective data and infrequent data, where physicians could only see patients with fluctuating conditions for a “snapshot” in time, and often must assess symptom severity for conditions, such as Parkinson’s disease, based on their observations.
“But perhaps the most significant challenge is the communication flow,” he said. “When you participate in a study you often don’t hear back until the very end of the study, if at all,” before positioning ResearchKit as a potential antidote to these 4 concerns.
Coinciding with its official launch were 5 studies using the platform: mPower, a Parkinson’s study; and Share the Journey, a breast cancer study, both through Sage Bionetworks; MyHeart Counts, a heart disease study through Stanford University School of Medicine; Asthma Health, an asthma study through the Icahn School of Medicine at Mount Sinai in New York; and GlucoSuccess, a diabetes study via Massachusetts General Hospital. Two years later, more institutions are using the platform, including a new pregnancy study from WebMD, an LGBTQ health study through the University of California San Francisco, a concussion tracker from New York University, and pharmaceutical giant GlaxoSmithKline’s ongoing PARADE rheumatoid arthritis study.
Given the fanfare surrounding launch, all of the 5 launch studies immediately saw an eye-popping number of downloads for its associated app. Cumulatively those apps had15,000 downloads in the first day alone. Asthma Health had over 40,000 downloads in its first month on the App Store, and 50,000 people consented to participate in MyHeart Counts between March and October 2015. Were app downloads and intent to participate correlated to traditional recruitment and consent, these would be dream numbers for any clinical researcher.
Early results have begun to be published in major medical journals. Scientific Data published observations from mPower in March 2016, JAMA Cardiology published the first readout from MyHeart Counts online in December 2016, and first results from Asthma Health came out in Nature Biotechnology in March of this year. Responding to the latter, a host of articles on tech blogs declared victory for the technology, with the reverence that Apple often receives. The word “revolutionizing” has appeared in more than 1 headline; a post on Fortune declared that “Apple’s Medical Research App Just Proved That It Really Works”; and Yahoo published a Mashable story titled “A major study just validated the iPhone’s potential for medical research.”
Did it, though?
The first reports were published in extremely well-respected clinical journals, but they don’t bring much revolutionary clinical data to the table. The Asthma Health study report, most recently released, ended up with 7593 people completing the initial enrollment process among the tens of thousands who downloaded it.
The app asked users to complete daily and weekly surveys to log their treatment adherence, symptoms, and environmental triggers for those symptoms. The researchers split the enrolled cohort based on their engagement with the application: 6470 users completed at least 1 of the study’s surveys and comprised the “Baseline” group, within which the “Robust” group included 2317 respondents who smoked less than 200 cigarettes per year, had no other lung disease, and completed at least 5 of the surveys. There was overlap with the smallest group in the study, as 131 out of a total 175 users deemed “Milestone” participants also falling into the Robust group. Milestone users were those who made it through to complete a survey of the same name at a distance of 6 months after initiation.
In essence, the 131 overlapping Milestone/Robust users were those who legitimately engaged the full process of the study, by completing a week’s worth of surveys and the 6-month follow-up. One of the exceptions to the tech sphere’s glowing reaction was a critical article in Ars Technica (“Out of the gate, health and research apps face-plant”), which pointed out that this was only 1.7% of those enrolled in the study. If one were to consider that slice of the Milestone group as those contributing to actual, viable research, then it may be good to think back to the Kathryn Schmitz example from ResearchKit’s early promotional video: 305 participants recruited from 60,000 mailed letters is a dismal conversion rate of 0.5%. From an initial download pool of 49,936, though, those 131 users equate to an even bleaker 0.26%.
The limited nature of the study—self-selected, nationwide, and only available to those with iPhones—created a different demographic than what the Centers for Disease Control and Prevention (CDC) lays down as the actual cross-section of asthma patients in the United States. “AHA users tended to be younger, wealthier, more educated, and were more often male than asthma patients in the CDC asthma population,” the study authors wrote. Majorities among the section of the Baseline group that had their personal data successfully recorded actually reported being males and also having a personal income north of $60,000 per year. When the dust settled from the app’s initial publicity, the enrollment began to round into a more realistic distribution: “The gender distribution of the cohort recruited later more closely approximated the CDC asthma population statistics,” the authors noted.
The MyHeart Counts study suffered an equally extreme overrepresentation of males, a whopping 82.2% of the 40,000-plus who consented to the research, though it did feature much better retention and participation metrics. The app included questionnaires about health, exercise, and background, and it also included a 6-minute walking test after 1 week. There were 48,968 people who consented to participate, and more than 81% of them contributed some form of data. The study passively used the smartphone’s accelerometers to track movement, and 41.5% allowed the app to do as such for 4 of the 7 days in the study. Less than 10% of individuals participated in the study for the whole week and slightly more took part in the survey at the end of the study.
Addressing Williams’ “communication flow” idea, the MyHeart study did have an element of feedback for users. Those participants aged 40 to 79, of whom there were 17,245, could receive a risk assessment based on American College of Cardiology and American Heart Association guidelines, provided their lipid values and participated adequately. Ultimately, only 1334 patients received that feedback: 7.7% of those eligible and 2.7% of consenting participants.
The takeaway from both reports was that each one was a test of how feasible it is to apply the technology to the research of given conditions, rather than studies of the conditions. Both were honest about that, and with the shortcomings that must be improved in order to transition ResearchKit from novelty to necessity.
Interestingly, however, the challenges they mentioned are almost exactly the ones Williams first trumpeted ResearchKit as the solution. These problems are ones of recruitment, participation, representation, and reliability, just expanded and given a new wrinkle. There is an improvement in reach and mobility: studies can be conducted nationwide rather than at 1 institution. Studies can use different sensory functions to submit novel, accurate, objective data remotely.
But “nationwide” doesn’t necessarily mean a good and accurate cohort. Williams said that financially incentivizing studies didn’t produce the best cross-section of disease population, but certainly neither does self-selection among users of a premium smartphone. There is an economic and racial breakdown of US smartphone ownership, with the wealthy and white far more likely to own an iPhone than the lower income and nonwhite. Although both have similar rates of smartphone ownership, a drastically higher percentage of lower income and nonwhite people use Androids. For an asthma study, that’s an inherent skew, as the Asthma Health App study noticed with participation statistics that bore no resemblance to the distribution of asthma in the United States.
Apple has showed no intention of offering a ResearchKit framework for Android phones that would make it easier for researchers to port those studies and bridge that demographic divide. Independently, Open mHealth created ResearchStack to do just that, but only 1 program has made the leap so far: Mole Mapper, which gathers data for melanoma research.
Even were retention and demographic issues solved, the decentralized nature of a national mobile study exposes it to problems of validation and subjectivity. One theory is that the personal influence of researchers may result in skewed reporting from participants, in an effort to try to tailor their symptom descriptions to what they think the study is looking for. Removing that connection just removes personal connections as a source of influence, rather than removing unreliable self-reporting.
None of this is to negate the incredible possibilities that exist with the technology. One of the major areas of intrigue that emerges within the studies is the ability to track unique regional happenings as they relate to health. In the Asthma Health study, the researchers observed that “there was a marked increase of participants reporting air quality triggers in regions affected by the summer 2015 Washington state wild res during the corresponding time periods.” Still, these observations aren’t new, and the researchers mention that: “the consistent trending of variables that we expect to be interrelated based on our knowledge of the disease, the tracking of symptoms with known environmental triggers...strongly suggest the validity of this new research-data collecting method.”
Before declaring a ResearchKit revolution, those looking to provide real, viable data with the technology might return to the same solutions that traditional, institution-based studies have had to use to ensure consistent engagement and reliable validation.