A data mining study of g-quadruplexes and their effect on DNA replication
Nichols, Gregory Shannon
Metadata[+] Show full item record
G-quadruplexes are guanine rich sequences of DNA that can form non-Watson-Crick four stranded structures. They have been found to exist in various regions of the genome and are believed to play a biological role. We hypothesize that the presence of these structures poses a barrier to DNA replication by standard DNA polymerases and thus requires the intervention of alternative robust but error-prone polymerases for the completion of DNA replication. To test this hypothesis in silico, we assumed that the presence of error-prone replication could be inferred by studying the degree of variation at these sites. We analyzed the density of single nucleotide polymorphisms in the neighborhood of potential Gquadruplex sequences in the human genome. The analysis shows a significantly higher density of single nucleotide polymorphisms within G-quadruplexes. Further, there is evidence of a directional bias in the extent of error, seen as an asymmetry in the incidence of single nucleotide polymorphisms on either side of quadruplexes. Taken together, the evidence favors the hypothesis that G-quadruplexes have a deleterious effect on the fidelity of DNA replication. A secondary research goal of the thesis is to reduce the number of false positives in the prediction of G-quadruplexes based only on sequence information. Most current algorithms are regular expression searches based on sequences that have shown potential to form G-quadruplexes. Using the results from our investigation on sequence variation, predicted melting temperature and machine learning models, attributes derived solely from the sequences were analyzed to determine if classification can be accurately performed. We conclude that factors external to the sequence may be important in determining if and when G-quadruplexes form.
Table of Contents
Introduction -- SNP dentistry analysis -- Melting temperature analysis -- Machine learning analysis -- Conclusion