Iterative reconstruction of three-dimensional model of human genome from chromosomal contact data
Metadata[+] Show full item record
[ACCESS RESTRICTED TO THE UNIVERSITY OF MISSOURI AT AUTHOR'S REQUEST.] 3D genome structures are important because they help us understand spatial gene regulation, transcription efficiency, genome interpretation, function implication (ENCODE), disease diagnosis, treatments and drug design. Recent study suggests that the spatial arrangement of chromosomes helps chromosomes to interact with themselves. This phenomenon convinced many researchers of the value of understanding the 3D genome structure, drawing interest to the field of genome modeling. Here we constructed 3D conformations of genomes using chromosomal contact data acquired by using the Hi-C technique. This technique is designed to determine both intra- and inter-chromosomal contacts in an unbiased manner at the whole genome scale. To construct 3D structures of any chromosome we only consider intrachromosomal contacts or interactions. We can think of a chromosome as a necklace with beads threaded together on a string. Now in our case, we can cut the whole chromosome into chunks that are one megabase (1Mb) in size, which gives us loci that we can treat as beads. Using our approach we can construct 3D structures of genomes at 1Mb scale by plotting the 3D coordinates of each 1Mb region and then connecting them. In a 3D modeling problem, it is crucial to initialize the starting model before using any optimization technique. So at first we try to initialize the coordinates using growth step which provides a probabilistic approach in determining their location. Chromatin that is not compressed into the dense chromosome form still resides in a globular shaped nucleus, suggesting a spherical model as a starting model for the smaller chromosomes. For larger chromosomes, former initialization is used as they have more regions for a specific resolution (i.e. 1Mb). After initialization, we apply two widely known optimization techniques, simulated annealing and genetic algorithms. Our novel scoring function allows optimization procedures to satisfy more intra-chromosomal contacts and non-contacts as well as some additional constraints. To perturb the position of the regions, as is mandatory for modeling optimization algorithms, the adaptation technique is used. This technique tries to fix the position of each region with high contact or noncontact satisfaction. This approach is inspired by similar work for proteins and can generate an ensemble of structures very quickly. The models generated are then compared with the published results of the MCMC5C method. It is found that in all cases our method produces models that are superior to the MCMC5C models. We present some visualization techniques to show how many contacts/non-contacts are satisfied/unsatisfied and also derive some simple yet powerful scoring measurements to evaluate widely known long range contacts. The robustness of the method is measured by convergence testing and recovering capability. Finally, we examine our final model for compartment features that Lieberman et al. suggested exist in chromosomes 14 and 22. We found those features to exist in our models as well, which validates our method.
Access is limited to the campuses of the University of Missouri.