Muhammad Shaban, PhD

Pronouns

He/Him/His

Job Title

Postdoc Researcher

Academic Rank

Research Fellow

Department

Pathology

Authors

Muhammad Shaban, Ming Y. Lu, Drew F. K. Williamson, Richard J. Chen, Jana Lipkova, Tiffany Y. Chen, and Faisal Mahmood

Principal Investigator

Faisal Mahmood

Research Category: Cancer

Tags

Deep learning-based multimodal integration of histology and genomics improves cancer origin prediction

Scientific Abstract

Accurate identification of a primary origin of metastatic tumors is essential for optimizing treatment and involves the integration of multiple forms of data during the examination of tissue by a pathologist. However, even with highly sensitive and specific immunohistochemical stains for some cell lineages, pathologists cannot reliably determine the origin of every metastatic tumor, with 1-2% classified as cancers of unknown primary (CUP) even with the integration of other clinical data. Previous work has shown the possibility of using artificial intelligence algorithms to predict primary origin using histology or different forms of molecular data, including genomics, transcriptomics, or methylation profiles. We present a multimodal deep learning algorithm that leverages routinely acquired histology slides, associated clinically-available genomics data, and patient sex to classify tumors into 18 different primary origins. Our approach shows substantial improvement over unimodal deep learning using histology or genomic data alone, achieving an accuracy of 88.1% and 92.0% on a held-out test (n=4,881) and external test set (n=660), respectively. Furthermore, on CUP cases (n=283), we observed an agreement of 85.5% between the model’s three most likely predicted origins and the differential diagnoses assigned in the associated pathology reports. At test time, our flexible model design enables origin prediction to be made from only histology or genomics alone, if necessary due to missing data. Additionally, our model allows us to perform interpretability studies to observe which parts of the histology and which genes contribute most to the prediction of a particular origin, a potentially useful tool for quality control and knowledge discovery.

Lay Abstract

Accurate identification of a primary origin of metastatic tumors is essential for optimizing treatment and involves the integration of multiple forms of data during the examination of tissue by a pathologist. However, even with highly sensitive and specific immunohistochemical stains for some cell lineages, pathologists cannot reliably determine the origin of every metastatic tumor, with 1-2% classified as cancers of unknown primary (CUP) even with the integration of other clinical data. Previous work has shown the possibility of using artificial intelligence algorithms to predict primary origin using histology or different forms of molecular data, including genomics, transcriptomics, or methylation profiles. We present a multimodal deep learning algorithm that leverages routinely acquired histology slides, associated clinically-available genomics data, and patient sex to classify tumors into 18 different primary origins. Our approach shows substantial improvement over unimodal deep learning using histology or genomic data alone, achieving an accuracy of 88.1% and 92.0% on a held-out test (n=4,881) and external test set (n=660), respectively. Furthermore, on CUP cases (n=283), we observed an agreement of 85.5% between the model’s three most likely predicted origins and the differential diagnoses assigned in the associated pathology reports. At test time, our flexible model design enables origin prediction to be made from only histology or genomics alone, if necessary due to missing data. Additionally, our model allows us to perform interpretability studies to observe which parts of the histology and which genes contribute most to the prediction of a particular origin, a potentially useful tool for quality control and knowledge discovery.

Clinical Implications

More accurate/timely origin identification for metastatic cancers using multi-modal (H&E image + genomics) machine learning