Brigham Research Institute Poster Session Site logo-1
Close this search box.

Sudeshna Das, PhD




Assistant Professor




Assistant in Neuroscience




Colin Magdamo, Ayush Noori, Tanish Tyagi, Xiao Liu, Lidia MVR Moura, Sahar Zafar, Nicole M Benson, John Hsu, John R Dickson, Alberto Serrano-Pozo, Bradley T Hyman, Deborah Blacker, Michael B Westover, Shibani S Mukerji, Sudeshna Das

Applying Deep Learning to Electronic Health Records to Screen for Patients with Cognitive Impairment

I am a data scientist with a broad interest in biomedical and brain research. In the last couple of years, I have developed a research program to apply artificial intelligence (AI) techniques to electronic health records (EHR) to study Alzheimer’s disease and related dementias. My research platform has demonstrated exciting progress and potential that could benefit from the networking opportunities at the Women in Medicine and Science Symposium. Women are underrepresented in Medicine and even more so in the computational sciences, and I believe that my talk could inspire others and encourage collaborations within the MassGeneral Brigham community.


Distinguishing between normal cognition and cognitive impairment in electronic health records (EHR) is a major challenge since dementia is under-recognized, under-diagnosed, and underreported in these data. Accurately phenotyping cognitive status requires the combined use of both structured EHR data, such as diagnosis codes and medications, as well as unstructured clinical notes.


To identify patients with cognitive impairment in EHR, we utilized a pre-trained deep learning language model called ClinicalBERT. We fine-tuned these deep language models to distinguish notes that contain i) evidence of cognitive impairment, ii) evidence of normal cognition, or iii) no evidence of either. These outputs were then combined with other structured features to classify the cognitive status of the patient.


Our attention based deep learning model, which can learn from complex language structures, had excellent performance (AUC = 0.98, micro-F1 score = 0.93) and substantially improved accuracy (0.93) relative to a baseline TF-IDF (term frequency-inverse document frequency) NLP model (0.84). Further, we show that our model can successfully identify dementia patients without dementia-related ICD codes or medications.


Automatic processing of EHR with deep learning tools can be used to screen for patients with cognitive impairment who could benefit from an evaluation or be referred to specialist care.