Latanya Sweeney Fights for Data Privacy, One State at a Time

Ryan Black

The researcher who helped shape HIPAA is now taking on gaps in patient data privacy nationwide.

Latanya Sweeney, PhD. Credit: US Department of Health and Human Services livestream.

Researcher Latanya Sweeney’s work on re-identification helped shape how HIPAA approached patient data privacy. In a speech at yesterday’s Data Privacy in the Digital Age symposium, she noted that times have changed, and the policy should evolve accordingly.

It started for Sweeney, PhD, in the 1990s. When she was a computer science graduate student at MIT, an ethicist told her computers were evil and someday they would aggregate every bit of every person’s personal data, subsequently compromising the information.

Looking to prove the opposite, Sweeney chose a target: Massachusetts Governor William Weld, who had just suffered a heart attack. She bought a cheap copy of Cambridge’s voter roll, delivered on 2 floppy disks. She quickly linked his discharge data to his name and address, knowing only the condition he suffered, his date of birth, gender, and zip code. In fact, she found that 87% of Americans could be identified through those 3 simple metrics.

Since then, published papers have alleged that such re-identification was only easy because Weld was a high-profile figure. But Sweeney and colleagues seem to have proven that theory false.

For $50, her team was able to buy a year’s worth of patient discharge data from Washington state. They re-identified 43% of 81 samples by comparing the information to news blotters. That experiment caused Washington to change its policy on data sales to one of tiered access: public, limited, and confidential data sets. Sweeney was surprised to find that only 1 other state (California) did the same.

“It’s clear we have to go state by state,” she said. So they have, starting at the geographic top of the country and are working their way down. They recently repeated the exercise in Maine and Vermont.

Most states sell or give away discharge data to analytics companies. Sweeney’s presentation listed Truven Health Analytics, Optuminsight, and WebMD Health among the top buyers of public health databases, with states like New York, Pennsylvania, and Maryland hawking their data to at least 6 of the top 8 buyers. Analytics companies, she noted “make their bones building profiles that they can sell.”

Two decades ago, before the health data analytics market was so crowded and hungry, some could afford to maintain privacy: paying for healthcare visits out of pocket, not signing up for pharmacy or convenience store loyalty programs to save a few dollars. Today, she said, everyone is in the same boat. “There is no escaping it. There is no buying out of it."

Sweeney isn’t under any delusion that data can exist and be made available with a 0% chance of re-identification. To shape the conversation, she thinks there needs to be a departure from a binary conversation of “anonymous” and “not anonymous,” because varying degrees of access and disclosure lead to varying degrees of anonymity. Tiered access is an important step, she believes.

Sweeney said there needs to be a thoughtful exploration around HIPAA’s Safe Harbor concept of anonymization, and what an acceptable risk threshold is. Certainly, though, that acceptable number has to be far lower than the double-digit percentages that she has been consistently able to expose using just Freedom of Information Act (FOIA) requests and newspaper clippings.

When asked what that threshold might be, though, she laughed and deferred. “I’m just a computer scientist.”