Background: Identifying biomarkers of emphysema will facilitate early diagnosis and the development of much-needed disease-modifying therapies. In the current study, we assessed the ability of blood transcriptomics in identifying CT-quantified emphysema in smokers.
Methods: We randomly split 2,655 COPDGene participants with available whole blood RNA sequencing (RNA-Seq), chest CT, and clinical data into 80:20 training and testing samples. We tested for gene expression associations with whole-lung emphysema (Hounsfield units at the 15th percentile of CT density histogram at total lung capacity, corrected for the inspiratory depth) in the training sample using the voom/limma method. We constructed elastic net models to predict emphysema using the statistically significant transcripts from the expression analysis, separately or in combination with candidate clinical predictors. We then classified subjects into quartiles of emphysema severity and assessed the accuracy of the models in predicting those at highest and lowest risks for emphysema in the testing sample using the areas under the receiver-operator-characteristic-curves (AUROC). All predictors were scaled, and their importance scores were defined by the absolute values of their coefficients in the regression models.
Results: Subjects had similar characteristics in the training and testing samples. The combined clinical and RNA-Seq model achieved high performance (AUROC=0.93), superior to the clinical only (AUROC=0.88) and RNA-Seq only (AUROC=0.79) models (P-values<0.05). FEV1/FVC, BMI, female sex, and four genes (AHRR, EIF1AY, PUDP, and CACNA2D3) were among the predictors with the highest importance scores in the combined model.
Conclusion: Blood transcriptomics combined with clinical factors can improve the prediction of emphysema.