In silico identification of putative causal genetic variants

Download paper here


Understanding the causal genetic architecture of complex phenotypes is essential for future research into disease mechanisms and potential therapies. Despite the widespread availability of genome-wide data, existing methods to analyze genetic data still primarily focus on marginal association models, which fall short of fully capturing the polygenic nature of complex traits and elucidating biological causal mechanisms. Here we present a computationally efficient causal inference framework for genome-wide detection of putative causal variants underlying genetic associations. Our approach utilizes summary statistics from potentially overlapping studies as input, constructs in silico knockoff copies of summary statistics as negative controls to attenuate confounding effects induced by linkage disequilibrium, and employs efficient ultrahigh-dimensional sparse regression to jointly model all genetic variants across the genome. Our method is computationally efficient, requiring less than 15 minutes on a single CPU to analyze genome-wide summary statistics. In applications to a meta-analysis of ten large-scale genetic studies of Alzheimer{ extquoteright}s disease (AD) we identified 82 loci associated with AD, including 37 additional loci missed by conventional GWAS pipeline via marginal association testing. The identified putative causal variants achieve state-of-the-art agreement with massively parallel reporter assays and CRISPR-Cas9 experiments. Additionally, we applied the method to a retrospective analysis of large-scale genome-wide association studies (GWAS) summary statistics from 2013 to 2022. Results reveal the method{ extquoteright}s capacity to robustly discover additional loci for polygenic traits beyond conventional GWAS and pinpoint potential causal variants underpinning each locus (on average, 22.7% more loci and 78.7% fewer proxy variants), contributing to a deeper understanding of complex genetic architectures in post-GWAS analyses. We are making the discoveries and software freely available to the community and anticipate that routine end-to-end in silico identification of putative causal genetic variants will become an important tool that will facilitate downstream functional experiments and future research into disease etiology, as well as the exploration of novel therapeutic avenues.Competing Interest StatementThe authors have declared no competing interest.

Code can be found here