학술논문

Studies on genomic G-quadruplexes
Document Type
Electronic Thesis or Dissertation
Source
Subject
572.8
G-quadruplexes
G-tetrads
gene regulation
Single Nucleotide Polymorphisms
Language
English
Abstract
The double-helical structure of DNA is well known, and the basis of modern genetics and molecular biology. However, DNA is polymorphic, and can adopt a variety of different structures. In this dissertation, I focus on a four-stranded structure that can be formed by guanine-rich sequences. These are known as G-quadruplexes or G-tetrads. I begin by investigating the structural properties of this structure, and how sensitive it is to mutations and deletions. I then consider the structural properties of the loops that join the four strands, and develop an understanding of how the length of the loops affects the stability of the structures, and also which folding pattern they adopt. Using the above results and some other considerations, I then develop a 'folding' rule, which predicts which sequences are expected to form quadruplexes under physiological conditions. Using this rule, I identify a number of putative quadruplex sequences in the promoter regions of a selection of oncogenes and develop a model for how these structures could be exploited as a drug target for gene regulation. The identified structures are characterised biophysically, and drug binding in vitro is demonstrated. An in vivo system using the fruit fly Drosophila Melanogaster is used to test whether drugs can be used to target a quadruplex in the promoter region of a key neuronal gene. I then address the hypothesis that quadruplexes could be a natural mechanism for gene regulation (or other functions). In order to investigate this, I develop a technique to search rapidly the entire human genome for quadruplex-forming sequences using the folding rule derived above. This identifies 350,000 potential sequences in the human genome. This is compared to the number expected if the DNA sequence was purely random (solved analytically), and using a simple Markov model for the human genome, showing that there are fewer such sequences than expected. Statistical study of the correlations between the lengths of the three loops formed by potential genomic quadruplexes show strong correlations, and these may be explained in terms of the folding pattern of the quadruplexes. This provides the first evidence of wide-scale presence of actual quadruplexes in the genome, and allows the calculation of a lower estimate for the number present. Co-location of Single Nucleotide Polymorphisms (SNPs) and quadruplex sequences is studied, with especial focus on those that have been correlated with diseases. A number of interesting clinical observations that could be attributed to quadruplex formation are investigated, including the COL1A1 osteoporosis gene. A number of quadruplex sequences are found to be conserved between human and mice, in similar positions. This is investigated further, and used to provide further evidence that there is some significance to these sequences. The role of transcription factors in binding these quadruplexes is discussed. In summary, this thesis broadens the quadruplex DNA field to consider their prevalence throughout the genome, develops a number of potential drug candidates, and demonstrates how important some of these sequences may be for biological function.

Online Access