Background: Large scale whole genome sequence (WGS) data in multi-ethnic samples provide the opportunity to identify novel associated variants and regions for Chronic obstructive pulmonary disease (COPD) and lung function, reflecting lower frequencies or population-specific signals.
Methods: We performed WGS analysis of pulmonary function (FEV1, FVC and FEV1/FVC) and COPD case-control status in 44,403 multi-ethnic participants from the NHLBI Trans-Omics for Precision Medicine Program. We performed single variant analysis and gene-based analysis using GENESIS implemented on the DNANexus platform. To identify which variants/genes were novel, we tested each significant variant/gene adding known variants within 2Mb as covariates. We also validated a subset of findings using exome sequence data from the UK Biobank.
Results: In the single variant analysis, we identified 21 significant regions including 1,045 variants in the multi-ethnic analysis at a genome-wide significance level of 5×10-9. Among these, 28 variants in 5 regions were significant after conditioning on known GWAS variants, identifying novel signals within/near MAGI1, RNF7, GRK7 and ADAMTSL3. In the gene-based analysis, we identified 5 significantly associated protein-coding genes; TTC22, HMCN1, GZMM, DMAP1 and ENSG00000285868 for the variants annotated as loss-of-function with high-confidence. Of these genes, HMCN1 and DMAP1 were significantly associated with FEV1/FVC in the conditional analysis, and the association with HMCN1 was replicated in the UK Biobank with p-values of 9.62×10-30 in European ancestry.
Conclusions: Our findings complement large-scale GWAS studies with a focus on low frequency variants, and indicate that novel genetic regions can be discovered with larger multi-ethnic sample sizes with WGS data.