Marc Berger Shares the Keys to the Data Mart

Ryan Black

“The data has gotten better, and our ability to interrogate it has gotten better. But it has always been dirty.”

In 1995, Marc Berger, MD, built what he called “the first real-world data mart” at Merck. Twelve years later, he began working for Eli Lilly, and he had to do it again. And in 2012, when he joined Pfizer, he did it once more.

“The data has gotten better, and our ability to interrogate it has gotten better,” he says during the Big Data and Analytics for Pharma Summit in Philadelphia. “But it has always been dirty.”

Dirty as it may be, he’ll be the last to say data aren’t useful. In fact, he says he does not believe in the classic scientific adage, “garbage in, garbage out.” Data, to Berger, must be coaxed.

Berger has moved on from big pharmaceutical companies. He now serves as the chair of the Real World Evidence Advisory Board at Shyft Analytics, Inc. In his journey through the sphere, however, he has gathered plenty of useful perspective for those in healthcare looking to tease insights out of large data sets.

Since the 1990s, he says, the uses and expectations of real-world data have changed. Pharmaceutical companies used to only look to the information for economic evaluations of drugs, but it has since become more nuanced and comparative.

There’s also a lot more real-world data, which have become more widely accessible. Between electronic health records, wearables and sensors, claims, and more, there are plenty of data to work with. Until these points are linked and merged, though, he does not consider it “big data.” These pieces are just the building blocks.

The main challenge in the near term, he says, is lowering the barrier to querying real-world data. Even for a relatively simple question, an organization needs to have skilled statisticians, good programmers, and a strong production function. Unfortunately, the speed of business far outstrips the speed at which data science can produce answers. “Even if you flog the system,” and get queries down from 4 to 6 months to 2 or 3 months, the quality of the insights come into question.

Any query infrastructure that is built, Berger says, requires constant updating. Innovations that seem novel “will soon be on the trash heap of history.” For large companies like the Pfizers and Mercks he once worked for, cultural resistance will always remain. Data scientists looking to build useful data marts in such environments need the full backing of upper level management, entrepreneurial teams, and the ability to create efficiency in their work.

For the last point, rapid cycle analysis is key. Rather than attempting to answer a wide-ranging, complex question in a single query (like if a drug works or if it’s safer than another), a series of more specific, simpler questions will allow larger answers to begin to take shape. A series of well-executed queries can begin to inform everything from drug pathway selection to business strategy, he says.

Technology has begun to expedite the ability to perform such analysis. Berger says that a moderately complex question can now be answered in a matter of hours or days, and a study can be done in a matter of months.