Background
Distinguishing between normal cognition and cognitive impairment in electronic health records (EHR) is a major challenge since dementia is under-recognized, under-diagnosed, and underreported in these data. Accurately phenotyping cognitive status requires the combined use of both structured EHR data, such as diagnosis codes and medications, as well as unstructured clinical notes.
Method
To identify patients with cognitive impairment in EHR, we utilized a pre-trained deep learning language model called ClinicalBERT. We fine-tuned these deep language models to distinguish notes that contain i) evidence of cognitive impairment, ii) evidence of normal cognition, or iii) no evidence of either. These outputs were then combined with other structured features to classify the cognitive status of the patient.
Results
Our attention based deep learning model, which can learn from complex language structures, had excellent performance (AUC = 0.98, micro-F1 score = 0.93) and substantially improved accuracy (0.93) relative to a baseline TF-IDF (term frequency-inverse document frequency) NLP model (0.84). Further, we show that our model can successfully identify dementia patients without dementia-related ICD codes or medications.
Conclusions
Automatic processing of EHR with deep learning tools can be used to screen for patients with cognitive impairment who could benefit from an evaluation or be referred to specialist care.