Statistical Methods, Computational Tools and Visualization of Hi-C Data
Abstract: Harnessing the power of high-throughput chromatin conformation capture (3C) based technologies, we have recently generated a compendium of datasets to characterize chromatin organization across human cell lines and primary tissues. Knowledge revealed from these data facilitates deeper understanding of long range chromatin interactions (i.e., peaks) and their functional implications on transcription regulation and genetic mechanisms underlying complex human diseases and traits. However, various layers of uncertainties and complex dependency structure complicate the analysis and interpretation of these data. We have proposed hidden Markov random field (HMRF) based statistical methods, which properly address the complicated dependency issue in Hi-C data, and further leverage such dependency by borrowing information from neighboring pairs of loci, for more powerful and more reproducible peak detection. Through extensive simulations and real data analysis, we demonstrate the power of our methods over existing peak callers. We have applied our methods to the compendium of Hi-C from 21 human cell lines and tissues, and further develop an online visualization tool to facilitate identification of potential target gene(s) for the vast majority of non-coding variants identified from the recent waves of genome-wide association studies.
(Refreshments will be served.)