Theoretical and computational modeling of naturally and artificially modified RNA nucleotides
Ribonucleic acid (RNA) is a polymeric nucleic acid that is crucial for cellular function, regulating gene expression and encoding/decoding protein/DNA molecules. Recent discoveries of diverse functionality in non-coding RNAs have led to unprecedented demand for RNA 3D structure determination. With current technology, general, accurate prediction of 3D structures for large RNAs from the sequence remains computationally intractable. One of the principal challenges arises from the conformational flexibility of RNA, especially in loop/junction regions, which results in a rugged energy landscape. Several strategies exist to overcome this challenge, including incorporation of efficient experimental information and coarse-grained (CG) modeling to improve computational sampling of the structural ensemble. A second challenge is the inclusion of naturally modified derivatives of canonical RNA nucleotides in structure analysis. Most RNA prediction strategies rely upon the canonical nucleotides (adenine (A), uracil (U), guanine (G), and cytosine (C)), ignoring the effects of modified nucleotides on the structure and system dynamics. In general, RNA molecules contain rigid and flexible structural elements, which can be probed using efficient selective 2'-hydroxyl analyzed by primer extension (SHAPE) experiments. SHAPE experiments selectively modify flexible RNA nucleotides and can be processed to produce a characteristic reactivity profile for an RNA molecule that contains structural information. Incorporation of efficient experimental information, such as SHAPE, in predicting RNA 3D structure is highly desirable for overcoming the current knowledge gap between RNA sequence and 3D structure. In the first project, we introduce a physics-based model, the 3D structure-SHAPE relationship (3DSSR) model, to predict the SHAPE reactivity from the structure and show how this model may be used to sieve SHAPE-compatible structures from a pool of low-energy decoys and refine our predictions. In the second project, we compare 3DSSR performance to that of a convolutional neural network (CNN) trained on the SHAPE data and RNA structures, showing that 3DSSR outperforms the CNN given the limited data available. In the third project, we further improve the 3DSSR model, gaining deeper insights into the SHAPE reaction and biases. In the fourth project, we explore the theory underpinning the iterative simulated CG RNA folding model (IsRNA). In establishing the underlying mechanics driving the success of the model, we were able to clarify and improve the parameterization method while expanding the model interpretation, which should broaden application of the method to other biopolymers, such as protein. We found that the parameterization method follows statistical mechanics principles but also has a Bayesian interpretation. Further, we found that the parameterization process can benefit from application of the principle of maximum entropy, which improves simulation and parameterization efficiency. In the fifth project, we investigate the impact of nucleotide modification on the structure and configurational ensemble of RNA molecules using free energy calculations. By applying modifications to a common RNA hairpin, we estimate the impact on the stability of the structural ensemble, identifying specific interactions that drive changes to the potential of mean force (PMF) and showing the context and modification-dependence of the variable alterations to the structure stability.