Background
Missense variants play key roles in human disease, but their functional variability complicates interpretation. The initial derivation of the missense badness, PolyPhen-2, and constraint (MPC) score to predict missense deleteriousness was widely employed but limited by the underlying human population reference set. Recently developed references of greater size and diversity offer increased power to elevate MPC’s precision and utility.
Methods
We leverage 125,478 exomes in gnomAD v2.1 to update the MPC and regional missense constraint (RMC) metrics, introducing a refined expected missense variation model and per-base resolution of RMC breakpoints.
Results
We discover 3,655 canonical transcripts with regional differences in missense constraint. Highly constrained regions align with protein domains. Highly deleterious (MPC ≥ 3) de novo mutations are 10x-enriched in 37,488 individuals with neurodevelopmental disorders, and pathogenic haploinsufficient ClinVar variants exhibit increased predicted deleteriousness (Wilcoxon p = 0.001). Transcripts with more expected missense variants harbor more regional constraint differences (Pearson correlation p < 10-16), indicating that population reference expansion will continue to enhance constraint modeling accuracy.
Conclusions
These improved metrics provide finer resolution to the landscape of missense constraint across the coding genome and are broadly applicable in clinical interpretation and gene discovery.