Big Data Tool Represents Major Step for RNA Substructure Analysis

Jared Kaltwasser

"An objective, precise, compact, unambiguous, easily-interpretable description of all loops, stems, and pseudoknots."

Mutations in the substructures of RNA are believed to be related to diseases and other genetic differences—so a holistic understanding of them could help healthcare along the road to true precision medicine. But genomic data is inherently big (and complex) data, and before it can be fully analyzed it needs to be organized.

To that end, researchers have unveiled a new tool that represents a major leap forward in scientists’ ability to annotate the substructures of RNA. The tool, called bpRNA, is a big data annotation software that enables researchers to crunch hundreds of thousands of RNA mutation data points at a scale not previously possible, expanding the potential to elucidate patterns not otherwise distinguishable.

>>>READ: Defining "Normal" in the Age of Precision Medicine

The tool can parse complex RNA structures to “yield an objective, precise, compact, unambiguous, easily-interpretable description of all loops, stems, and pseudoknots, along with the positions, sequence, and flanking base pairs of each such structural feature.”

David Hendrix, PhD, an assistant professor in the Department of Biochemistry and Biophysics at Oregon State University, told Healthcare Analytics News™ that the research was sparked by a desire to understand more about disease-associated variations that occur in noncoding RNAs.

“Do these mutations affect structure? We can predict secondary structure pretty well, and have tools to do that with fairly high accuracy,” he said. “However, if you're analyzing hundreds or thousands of RNAs to see if there are general trends in where RNA structures get disrupted, there was a limitation in tools that could annotate the secondary structure.”

The link between structure and function of RNA is critical, and Hendrix said the automation of the annotation process means such links and patterns can become much clearer—and the process is a lot quicker. He said some work has already been done in this area, but he believes bpRNA represents a major improvement on the previously available tools.

“To our knowledge, there wasn't a tool that could do this in the same way,” he said. “There is an existing database of annotated structures, but we found errors in that database, suggesting their annotation tool was inaccurate, or that something was mis-annotated.”

In his paper, Hendrix notes an example of an RNA that is listed as having zero bulges in the existing database, but bpRNA found there are actually four bulges. Hendrix said this type of error was not an isolated incident. Even small inaccuracies in the data could have major implications for the ability to spot patterns and interpret data.

“I believe our tool provides a more easily-interpretable and more accurate annotation of RNA secondary structural features,” he said.

Hendrix and colleagues have already tested the tool on more than 100,000 structures, and they are making their database available to the public. The availability of these annotations could lead to a wide array of new research into associations between such mutations and diseases.

“Future studies could perform a large-scale analysis of RNA secondary structure in the human genome, and examine the statistics of what types of structural features (loops, multiloops, bulges, internal loops, pseudoknots) in predicted secondary structure are disrupted by disease-associated allelic variants, and if certain structural features are more likely to be disrupted,” he said.

The study is titled, “bpRNA: large-scale automated annotation and analysis of RNA secondary structure,” was published in May in Nucleic Acids Research.

Related Coverage:

Lost in the CRISPR Hype, a Gene-Editing Giant Is Fighting Back

Striving to Commercialize CRISPR, Inscripta Lands $55M

Macro Data for the Microbiome