|Articles|September 10, 2018

In the US, Healthcare Data Access Is a Scavenger Hunt

As researchers use big data to improve health, they’re bumping up against the hard limits of a fragmented healthcare system.

^{Image has been modified. Credit: monsitj - stock.adobe.com.}

In the era of big data, obtaining data sets that paint a comprehensive picture of the American healthcare landscape is a big — if not impossible — challenge.

That’s not to say researchers aren’t trying.

“In a study … recently completed, we pulled together over 150 different data sources, and our data still weren’t completely representative of the U.S.,” said Joseph Dieleman, Ph.D., an assistant professor at the University of Washington’s Institute for Health Metrics and Evaluation.

>> READ: The NIH Makes a Big Push for Big Data

His research focuses on healthcare data, the economics of healthcare and healthcare policy, all areas that rely on solid data. Yet in the U.S., it’s virtually impossible to get a robust, all-encompassing picture of Americans’ healthcare.

Trudy Krause, Dr.P.H., MBA, associate professor of management, policy and community health at UTHealth School of Public Health in Houston, said the way America’s healthcare system is constructed makes compiling data something like a scavenger hunt.

“Unlike countries with a national healthcare system, the U.S. healthcare system is fragmented by payer type,” she said.

There are the public programs and entities, like Medicare, Medicaid and the Department of Veterans Affairs (VA), and then there is a vast landscape of private insurers, selling plans on the open market or through employer-sponsored plans.

“Thus, there is no central data bank of claims data from the payers,” Krause said.

Scrambling for Health Data

Consequently, health data researchers are left with a series of calculations — literally and figuratively.

“There (are) data on Medicare beneficiaries, some data on Medicaid, private insurance and (much less data) on uninsured spending, although this is a smaller fraction of (the) health sector,” Dieleman told Healthcare Analytics News™. “Getting access to all these data is very difficult; analyzing them jointly in order to study the entire health sector is really challenging.”

Broadly speaking, data from the government programs are easier — if not easy — to obtain.

“Centers for Medicare & Medicaid Services (CMS) makes some data available to researchers through the Research Data Assistance Center,” Krause said, “but it is project limited, and there are fees.”

CMS also has the Qualified Entity Certification Program, which enables qualifying organizations to access data. However, the certification process takes time, and even with certification, researchers must pay for the data they use.

>> READ: Leo Celi and the ‘Holy Grail of Personalized Medicine’

Because Medicaid varies on a state-by-state basis, access to Medicaid data is likewise hit-and-miss, Krause said.

However, even with a complete set of public-sector healthcare data, researchers would miss information on most Americans. According to the Kaiser Family Foundation, 56 percent of U.S. citizens had private insurance in 2016, either through an employer-sponsored plan or a nongroup plan. Add in the 9 percent of patients who were uninsured, and government data accounts for just over a third of all patient information.

Furthermore, because programs like Medicare, Medicaid and VA initiatives are designed for specific populations, comprehensive government data would not give a representative sample of the public, experts said.

Given the limits of public healthcare data, researchers must either use complex weighting and equations to adjust the numbers or turn to the private sector for data. The problem is, data from commercial insurers or pharmacies are even harder to come by.

“Most commercial carriers try very hard to protect their proprietary information that reveals contracting terms for providers, and thus (insurers) limit charge and payment information when providing data,” Krause said.

Worries about privacy add another layer of concern, causing insurers to remove identifying information from the data.

The end result, Krause said, is that most commercial insurers don’t share such data, and the few who do charge a considerable amount.

Another option, Dieleman said, is to simply conduct surveys of patients. Those data, though, “(have) many of (their) own challenges related to smaller sample size and respondents’ own reporting biases.”

Big Data from Vertical Integration

One research agency that does enjoy broad access to commercial claims data is Kaiser Permanente Center for Health Research, in Portland, Oregon. The institution has ready access to claims data on more than 12 million commercially insured patients. It has this access because of one important reason: It’s affiliated with Kaiser Permanente, the vertically aligned healthcare behemoth.

>> READ: What Healthcare Can Learn from a Proud Data Parasite

“Kaiser Permanente as a whole has a diverse population base distributed across eight different regions — including Hawaii, the most diverse state in the country,” said Alan Bauck, MBA, director of research data and analytics at the Kaiser Permanente Center for Health Research. “Taken together, Kaiser Permanente’s 12.2 million members are highly representative of the diversity of the nation.”

The data set isn’t a panacea. Kaiser-Permanente’s service areas tend to be more urban than the country as a whole, which means its patient base skews more diverse. And even with 12.2 million patients, the data set isn’t always large enough to render statistically significant insights into specific categories of patients, such as those with certain rare diseases. Bauck also noted that sometimes healthcare studies require nonmedical data that don’t make it into health records.

“For example, to get a complete picture of a person’s total health, researchers may need to consider not only healthcare interactions and claims information — which Kaiser Permanente captures well — but also factors such as socioeconomic variables, personal behaviors and genomics,” Bauck said.

In those instances, the center sometimes uses data from the U.S. Census Bureau or patient-reported socioeconomic data.

Single-Payer System Leads to More Robust Data

Because of the size of Kaiser-Permanente’s healthcare network, the company’s research center has some of the same data availability advantages as foreign countries with single-payer or socialized healthcare. In fact, Bauck noted, Kaiser-Permanente has the world’s largest nongovernmental electronic health system.

Those rich data sets make it easier for researchers in other countries to track large cohorts of patients over time and to get more comprehensive snapshots of the overall population.

Still, Dieleman said, even single-payer systems don’t offer a perfect solution to healthcare data.

“The difference is that in countries where there is a single-payer system, those data are closer (although still not perfectly representative of health services),” he said. “For example, (the National Health Service) in the (United Kingdom) only includes about 80 percent of total healthcare spending.”

Other countries also tend to protect their data in a manner like that of U.S. commercial insurers, with barriers like lengthy applications and high costs. Still, researchers in those countries can at least know that breaking through the barrier of one provider — the government — will unlock a high percentage of healthcare data.

Efforts to Improve U.S. Data Availability

Back in the U.S., a number of efforts have been made over the years to improve data accessibility.

“There have been some attempts to provide a central repository, such as the All Payer Claims Databases (APCDs) that exist through regulation in some states (but not all), and most APCDs limit data use for researchers and charge for that use,” Krause said. “Some private entities have also attempted to aggregate commercial claims data from select payers, but data use to researchers is limited and costly.”

>> READ: Can Google’s Cloud API Solve Healthcare’s Disparate Data Problem?

Earlier this year, CMS Administrator Seema Verma, MPH, announced that the government would begin making certain healthcare data more readily available to researchers. That effort will begin with Medicare Advantage data. Next year, the agency will make Medicaid and Children’s Health Insurance Program data available, a move that could solve the problem of inconsistency in state data release policies.

Data Scientists’ Healthcare Wish Lists

In the meantime, experts who specialize in analyzing healthcare data keep scraping together as much data they can find, even as they add more items to their wish lists.

In Texas, Krause’s center has pulled together data from Medicare and Medicaid and most of the state’s commercial insurers, including Blue Cross Blue Shield.

“We believe that this allows us to provide an accurate analysis of the insured persons in the state,” Krause said.

The caveat is that because her team doesn’t have data from the VA, from Tricare, a military insurance program, or on the uninsured population, it can’t include those populations in its findings.

Krause said she would like to see Texas create an APCD requirement to help fill the gaps. That, at least, would create a comprehensive state-level data set.

“If we did, UTHealth could lead the way and apply our policies for data access to qualified researchers and also be able to use it to inform policymakers in our great state,” she said.

At the top of Dieleman’s wish list are more robust data.

“My dream data set is nationally representative claims data that (are) linked to information about an individual’s behaviors (smoking, exercise, diet, health seeking behavior, etc.) as well as other information about income, education, race and geography,” he said.

Meanwhile, Bauck, whose perch at Kaiser Permanente gives him access to some of the most robust commercial claims data of any research center, said even with great data, the goal of improving healthcare is possible only if the techniques used to analyze and reap insights from the data continue to improve.

“For us, the challenge is not necessarily acquiring new data but continually enhancing the way we utilize available data to find new answers and solutions that will ultimately improve health outcomes and healthcare delivery at Kaiser Permanente and beyond,” he said.

Get the best insights in healthcare analytics directly to your inbox.

Questions Surround 23andMe’s Decision to Cut Off Developers from Its API

The Barriers to True Healthcare AI

Stay ahead of the evolving healthcare landscape with expert insights on leadership, operations, policy, innovation, and workforce strategy. Subscribe to Chief Healthcare Executive today.

In the US, Healthcare Data Access Is a Scavenger Hunt

Scrambling for Health Data

Big Data from Vertical Integration

Single-Payer System Leads to More Robust Data

Efforts to Improve U.S. Data Availability

Data Scientists’ Healthcare Wish Lists

Related Content

Helping hospitals scale new technologies

The credibility crisis in healthcare communications | Viewpoint

HFMA annual conference looks to the future

NYU Langone Health plans new Long Island campus

Nationwide Children’s Hospital works to close gaps in care

Latest CME

A Case-Guided Discussion on Managing Immune Thrombocytopenic Purpura (ITP)

GI Tumor Board—Applying Recent Advances in Biomarker Testing and Treatment in Metastatic Colorectal Cancer

Evolving Treatment Strategies in Pancreatic Cancer: Current Standards, Emerging Targets, and the Role of Molecular Testing

Medical Crossfire®: Precision Medicine in Glioma Treatment — Integration of Molecular Profiling to Inform Targeted Therapies

Cases and Conversations™: Sorting Through the Expanding Treatment Options for Patients with Relapsed/Refractory Multiple Myeloma

PER Tumor Board®: Applying Recent Advances to Transform the Treatment Paradigm in SCLC—Expert Perspectives on New Approvals and Emerging Strategies

Medical Crossfire®: Harnessing the Power of Modern Therapies in Newly Diagnosed Multiple Myeloma

Medical Crossfire®: Improving Patient Outcomes in Myeloproliferative Neoplasms With Novel Therapeutic Approaches

Tumor Board: Expert Insights on Managing Classical 𝘌𝘎𝘍𝘙 Mutations, 𝘌𝘎𝘍𝘙 Exon 20 Insertions, and Atypical 𝘌𝘎𝘍𝘙 Mutations in Metastatic NSCLC

Medical Crossfire®: Expert Perspectives on Targeting c-Met Overexpression and 𝘔𝘌𝘛 Genomic Alterations in NSCLC – Unveiling the Complexities of 𝘔𝘌𝘛 Dysregulation

Cases & Conversations™: Transforming AML Care—Precision Strategies, Evolving Therapies, and Clinical Insights

Medical Crossfire®: Integrating Next-Generation Endocrine Targeting Therapies to Improve Outcomes for Patients With HR+/HER2- Breast Cancer

Breast Cancer Tumor Board: Targeting TROP2 – Innovations in Triple-Negative Breast Cancer Treatment

Expert Guidance on Frequently Asked Questions Regarding the Use of ADCs in TNBC

Evaluating the Latest Data and Ongoing Trials for Novel ADC Approaches in TNBC

Establishing the Rationale for ADC and ICI Combinations in TNBC

Breaking Down the Rationale for Targeting TROP2 in TNBC

Dissecting Clinical Trial and Real-World Data for ADCs in TNBC

Breaking Down the Latest Clinical Data for First-line Maintenance and R/R SCLC

Cross-Disease Integration of Immunotherapy Innovations

Broadening the Frontline—Studies Informing the Use of Immunotherapy in Hepatocellular Carcinoma

Optimizing Treatment for Biliary Tract Cancers

PER Resource Center: Integrating Novel Approaches in TNBC – New Avenues for TROP2-Targeting ADCs and Beyond – Nursing

Practical Considerations and Future Directions for New Treatment Strategies in SCLC

Expert Roundtable and Panel Discussions: Current and Future Landscape of TNBC

Show Me the Data®: New and Emerging Roles for Oral SERD Therapy in the Treatment of ER+/HER2– Breast Cancer

Navigating Treatment Gaps in SCLC: Relapse, Resistance, and Need for New Options

Medical Crossfire® in Adjunctive Testing: Charting a New Course in Prostate Cancer Risk Assessment

BURST CME™ Resource Center: Integrating Novel PSMA-Directed Radioligand Approaches for Diagnosis and Management of Prostate Cancer

Radioligand Therapy 101: The Science Behind the Strategy

Ready for Radioligand Therapy? Patient Selection and Sequencing Simplified

Working Together: Overcoming Barriers to Optimize Outcomes in Patients Treated With Radioligand Therapy Through Multidisciplinary Care

Imaging Matters: Decoding PSMA PET for Better Decision-Making

A New Era of Targeted Therapy for Advanced NSCLC: Exploring Future Directions for Bispecific Antibodies and ADCs

Community Practice Connections™: Enhancing Melanoma Outcomes With Intratumoral Oncolytic Immunotherapy–Strategies for the Multidisciplinary Team

Advances in Managing EGFR-Mutant NSCLC: Applying Evidence Across the Disease Continuum

Navigating Advances in Neovascular Retinal Disease: Translating Evidence to Practice in AMD, DME, and RVO

Enhancing Prostate Cancer Outcomes – The Role of PSMA and Targeted Treatment Strategies

(CME Track) Antibody–Drug Conjugates in Oncology: The Essentials of AE Management for Better Patient Outcomes

Community Practice Connections™: Optimizing SCLC Treatment Strategies and Managing Adverse Events Across Disease Stages

Personalized Approaches in NSCLC: Early Detection, Molecular Testing, and Targeted Therapies

9th Annual School of Nursing Oncology™

Community Practice Connections™: DLL3-Targeting Bispecific Antibodies for Small Cell Lung Cancer—From Innovation to Practice

Trending on Chief Healthcare Executive

Helping hospitals scale new technologies

NYU Langone Health plans new Long Island campus

Hospitals offer roadmap to cut healthcare costs: Five strategies

The credibility crisis in healthcare communications | Viewpoint

HFMA annual conference looks to the future