Scattered disease-linked variants and convergent functions: discovery from big data integration
Genome-wide association studies (GWAS) has identified thousands of disease-linked single nucleotide polymorphisms (SNP) in the human genome. Most of them have a small effect size (OR<1.4) and locate independently across multiple chromosomes. It remains unclear how they collectively cause the diseases due to the issue of missing heritability. Classic tests of genetic interactions suffer from insufficient power. Here, we will present an integrative approach that leverages several omics datasets to obtain additional information beyond genotypes and thus reducing the number of hypotheses. We combine traditional semantic similarity for genes’ functions and very deep network permutations (100K times) to quantify the empirical significance of downstream function similarity of any pair of SNPs. This approach enabled us to discover a fundamental biological mechanism for complex diseases: SNPs associated with the same disease are more likely to associate with the same downstream genes or functionally similar genes than unrelated diseases (OR>12). We also found 40-50% of prioritized SNP-pairs have significant genetic interactions from three independent GWAS datasets. These results provide new biological interpretation to genetic interactions and a “roadmap” of disease mechanisms emerging from GWAS SNPs, especially those out of coding regions.