Genetic Databases Bump Up Against Privacy Concerns

Patients and database owners must grapple with peeping governments, companies, and hackers.

In many ways, it’s the golden age of genetic research.

As our knowledge of genetics increases and our ability to crunch massive amounts of data accelerates exponentially, scientists have the ability to make highly accurate predictions about health, both at the individual and population levels. That information could have far-reaching implications for the ways we optimize healthcare—implications that we’re only now beginning to realize.

But there’s a caveat.

Among the things that can be elucidated from genomic data is a person’s identity. In theory, this could happen even in cases where a patient’s name is removed from the data. And in the age of Cambridge Analytica, the Golden State Killer, endless hacking of banks, and daily ransomware attacks, the potential for telling genomic data to be linked to individuals has become a major headache for genetic researchers and cybersecurity experts alike.

>> LISTEN: Should Cops Have Access to Consumer DNA?

Just ask Bonnie Berger, PhD, a mathematician at the Massachusetts Institute of Technology. She and her colleagues were frustrated by the effects that privacy issues have had on research. In many cases, investigators have to wait months for approval to access large databases of genomic data. And while privacy has always been a concern for scientists handling human subject data, Berger told Healthcare Analytics News that the issue overlaps with new fears on the part of the public about online privacy.

“Increasing public awareness of data privacy issues may lead to fewer individuals contributing their genomic data to scientific studies and biobanks, a challenge that cryptographic solutions such as our work can help overcome,” she said.

Berger and colleagues at MIT and Stanford are working to improve the cryptographic methods used to protect genomic data. They recently unveiled a new “secret sharing” system that protects genomic data by dividing sensitive data among multiple servers. The hope is that a better security apparatus will make it easier for researchers to obtain and share data on a large scale.

Top Threats

If data security is the concern, the logical question is: Who are we protecting data from? The answer can be divided into 2 broad categories: business and governmental organizations that might have an interest in our genetic secrets and hackers or cybercriminals.

Ellen Wright Clayton, JD, MD, MS, co-director of the Center for Genetic Privacy and Identity in Community Settings at Vanderbilt University, said many people have second thoughts about sharing their genetic data because they worry it could be used against them.

“They’re worried about employment,” she said. “They’re worried about insurability. And they’re worried about the government.”

The Genetic Information Nondiscrimination Act of 2008 (GINA) was meant to allay concerns about genetic data being used in dystopian ways. GINA protects against discrimination in employment and health insurance, though the law is limited in its scope. Patients are also protected against health insurance discrimination based on pre-existing conditions under 2010’s Affordable Care Act.

But Mark A. Rothstein, JD, who directs the Institute for Bioethics, Health Policy, and Law at the University of Louisville, said GINA doesn’t outlaw a number of other types of potential discrimination. Life, disability, long-term care insurance, real estate transactions, for example, aren’t covered under GINA.

He noted that a privacy risk that is often overlooked is the area of “compelled disclosures,” instances where people are asked to give access to their health records in order to apply for products or services.

“For example, it is reasonable for a disability insurance company to see the medical documentation of disability before they pay a claim,” he said. “The problem is that with the advent of electronic health records, more information can be disclosed easily, and it is not unusual for the disclosure to be of the individual’s entire health record.”

>> READ: If You Can't Beat the Hackers, Join Them

Thus, the disability insurer could also receive unrelated information such as a patient’s mental health history or genetic information. Partly because at least 25 million compelled disclosures take place every year, as per Rothstein’s research, many people are reluctant to undergo any form of genetic testing.

So, what about the hackers?

Bradley Malin, PhD, a professor of biomedical informatics and associate professor of biostatistics who co-directs Vanderbilt’s genetic privacy center with Clayton, said the threat from hackers is real but also somewhat narrow.

“Hackers are always going to be interested in VIPs, so [there’s a risk] if you have individuals in these databases who are sufficiently high-profile that paparazzi would be interested in the data, or it could be used against somebody in some kind of smear campaign,” he said.

However, Malin noted that most hackers don’t have the expertise or resources to interpret the data on their own, meaning it would be difficult to yield meaningful information from raw genetic data in a cost-effective manner. In the future, there might be a simple tool that delivers quick, cheap genetic information, but it doesn’t exist yet.

“I think, again, this becomes a question of the value and the ability to do interpretation quickly,” he said.

But hackers can be a strange breed. Some might simply want to prove they have the skill to sneak into a database, either to embarrass the database’s owner or simply to raise their own profile.

Digging Up Data

If someone wants to access a cache of genetic data, or genomic data for a single person, there are at least 3 key categories of databases. One category is government, academic, and nonprofit agencies that collect, store, and release data for research purposes. Another category are the for-profit companies that collect and analyze data as a consumer service. The last category is law enforcement.

Data collected by the federal government or used in research funded by the federal government are subject to strict privacy guidelines.

Jim Ostell, PhD, director of the National Center for Biotechnology Information at the NIH’s National Library of Medicine, said the information his center stores is generally stripped of private information that could be used to identify donors.

>> READ: An Innovative Way of Collecting DNA Samples Should Have Researchers Salivating

“The NCBI database of Genotypes and Phenotypes (dbGaP), for example, contains measurements, lab test results, and genetic data that were contributed by individuals who voluntarily enrolled in biomedical research studies, but the database does not include any direct identifiers to patient information,” he said. “Terms of use of the database and the code of conduct for researchers who are granted access to the database specifically prohibit any attempts to re-identify individual participants, e.g., by merging dbGaP data with other data sources.”

Although many of the government’s database resources, such as ClinicalTrials.gov, make aggregate-level data freely available to the public, individual-level study data is only made available to senior researchers who agree to follow restrictions on how the data are used.

Malin, who co-chairs the data privacy and security working group for the NIH’s All of Us health data initiative, said the federal government has a good track record of data protection.

“Even though the federal government is a little bit hamstrung in terms of the requirements that they have to adhere to in terms of how to get new technology certified and into place, I do think they’re still doing best practice in terms of the technologies that they can use,” said Malin, who wasn’t speaking on behalf of the All of Us initiative.

Meanwhile, the private sector is quickly compiling its own databases of genetic data in support of the burgeoning direct-to-consumer genetics industry. Malin said oversight over these companies is somewhat lagging.

“It’s a rapidly growing environment with a relatively limited oversight, because it’s treated as if it’s just a general consumer product,” he said. “So, mainly, the Federal Trade Commission has oversight with respect to what’s taking place.”

>> READ: Hunting for the Heart of a Changing Community

Malin said the Food and Drug Administration has also taken on a role, though its focus is primarily in ensuring the companies provide accurately interpreted products.

Among the biggest players in the direct-to-consumer space is 23andMe.

Asked about its privacy protections, 23andMe provided a statement saying the company does not sell individual customer information and requires a patient’s voluntary and informed consent before their data are included in data sets made available to researchers.

“23andMe customers are in control of their data—customers can choose to consent, or not to, at any time,” the company said. “Our consent document and privacy statement are published online for everyone to read, and our research is overseen by an independent third party (IRB) to ensure research meets all legal and ethical standards.”

The company said more than 100 peer-reviewed publications have been written using the company’s data.

In general, well-established companies have strong privacy records because they see it as integral to their business models, Malin said.

“Some of the more reputable companies that see a strong relationship with their consumers as being the driving force behind their business are very strongly motivated to have these types of protections in place,” he said.

However, Malin warned that there are an increasing number of smaller firms may not have the same capabilities to protect consumers.

One such company, GEDmatch, was recently in the news when law enforcement officials in California arrested a man they believe to be the “Golden State Killer” based on a match between DNA found at a crime scene and DNA found using the free genealogical analysis tool.

Malin said law enforcement agencies’ use of data opens up a number of privacy and confidentiality issues, which are still evolving. But law enforcement does a good job of locking down its data, he added.

Patient Advice

All of this begs the question: What should a patient do in such a fraught landscape? After all, science is dependent upon voluntary participation in scientific research, and the more data collected, the easier it will be for researchers to make breakthroughs.

“My advice would be to be wary of institutions or entities you don’t know,” Rothstein said. “If you donate your samples and information to a well-known institution, you can have a reasonable degree of confidence that your information will not be wrongfully disclosed.”

For other entities, Rothstein said, users should not simply click “I agree” without reading the full agreement, lest they find out too late that they unwittingly gave the company permission to sell their data or even to match it with other personal data, such as contacts or GPS information.

Clayton said she wouldn’t be worried about donating to a medical or academic institution, since she has confidence in the guidelines—both legal and ethical—that they follow. However, if the potential donor is unsure, it’s key to ask questions.

“I would want to know that data about me are going to be shared in a de-identified fashion with entities that are going to take seriously the need to protect the privacy and security of the data,” she said. “That’s what I would want to know.”

Get the best insights in healthcare analytics directly to your inbox.

Related

How Did 23andMe Stumble in Its Early Days?

Using a Web-Based Platform to Deliver Genetic Results Responsibly

Lost in the CRISPR Hype, a Gene-Editing Giant Is Fighting Back