Brigham Research Institute Poster Session Site logo-1

Lily Wang




Research Fellow







Lily Wang*, Katherine R. Chao*, Genome Aggregation Database Consortium, Anne H. O'Donnell-Luria, Heidi L. Rehm, Daniel G. MacArthur, Michael E. Talkowski, Grace Tiao, Konrad J. Karczewski, Mark J. Daly, Kaitlin E. Samocha

The landscape of regional missense mutational intolerance quantified from 125,748 exomes

I am a PhD student in the Bioinformatics and Integrative Genomics program at Harvard Medical School, where I am a member of Dr. Michael Talkowski’s lab at the MGH Center for Genomic Medicine. I am interested in understanding the functional consequences of variation across the genome and the mechanisms that generate this variation, with keen interest in the translational potential of this research. I am excited for this symposium to spotlight the accomplishments of women in medicine and science, bringing more visibility to women in leading roles at our institutions and connecting them with trainees.


Missense variants play key roles in human disease, but their functional variability complicates interpretation. The initial derivation of the missense badness, PolyPhen-2, and constraint (MPC) score to predict missense deleteriousness was widely employed but limited by the underlying human population reference set. Recently developed references of greater size and diversity offer increased power to elevate MPC’s precision and utility.


We leverage 125,478 exomes in gnomAD v2.1 to update the MPC and regional missense constraint (RMC) metrics, introducing a refined expected missense variation model and per-base resolution of RMC breakpoints.


We discover 3,655 canonical transcripts with regional differences in missense constraint. Highly constrained regions align with protein domains. Highly deleterious (MPC ≥ 3) de novo mutations are 10x-enriched in 37,488 individuals with neurodevelopmental disorders, and pathogenic haploinsufficient ClinVar variants exhibit increased predicted deleteriousness (Wilcoxon p = 0.001). Transcripts with more expected missense variants harbor more regional constraint differences (Pearson correlation p < 10-16), indicating that population reference expansion will continue to enhance constraint modeling accuracy.


These improved metrics provide finer resolution to the landscape of missense constraint across the coding genome and are broadly applicable in clinical interpretation and gene discovery.