Summary statistics knockoff inference empowers identification of putative causal variants in genome-wide association studies

Download paper


Recent advances in genome sequencing and imputation technologies provide an exciting opportunity to comprehensively study the contribution of genetic variants to complex phenotypes. However, our ability to translate genetic discoveries into mechanistic insights remains limited at this point. In this paper, we propose an efficient knockoff-based method, GhostKnockoff, for genome-wide association studies (GWAS) that leads to improved power and ability to prioritize putative causal variants relative to conventional GWAS approaches. The method requires only Z-scores from conventional GWAS and hence can be easily applied to enhance existing and future studies. The method can also be applied to meta-analysis of multiple GWAS allowing for arbitrary sample overlap. We demonstrate its performance using empirical simulations and two applications: (1) analysis of 1,403 binary phenotypes from the UK Biobank data in 408,961 samples of European ancestry, and (2) a meta-analysis for Alzheimer’s disease (AD) comprising nine overlapping large-scale GWAS, whole-exome and whole-genome sequencing studies. The UK Biobank analysis demonstrates superior performance of the proposed method compared to conventional GWAS in both statistical power (2.05-fold more discoveries) and localization of putative causal variants at each locus (46% less proxy variants due to linkage disequilibrium). The AD meta-analysis identified 55 risk loci (including 31 new loci) with ~70% of the proximal genes at these loci showing suggestive signal in downstream single-cell transcriptomic analyses. Our results demonstrate that GhostKnockoff can identify putatively functional variants with weaker statistical effects that are missed by conventional association tests.