Though deep learning for pathology images is a rapidly growing field, current approaches depend on pre-extracted embeddings to represent raw image inputs, which can often reach gigapixels in size. These representations are usually generated by frozen models pre-trained on large general image datasets such as ImageNet or learned using self-supervised learning without taking advantage of the available slide-level labels. We present a framework that enables training machine learning models to directly predict patient outcomes from digital images of clinical tissue specimens via supervised machine learning and apply it to the complex task of pan-cancer diagnosis, achieving state-of-the-art performance. In addition, we demonstrate that the model has learned powerful representations of histopathologic morphology that can be transferred efficiently to other useful clinical tasks.
Digital histopathology images are enormous in size, making it challenging to apply high-performance supervised machine learning and learn powerful representations of data, which have been shown to generalize to a wide array of image recognition tasks. Instead, currently the field mostly relies on data representations learned from natural images, such as images of dogs and cats, or without any supervision, limiting the performance of such machine learning models and their transferability to other tasks. We present a framework to train machine learning models to directly predict patient outcomes from digital images of clinical tissue specimens via supervised machine learning and apply it to the complex task of pan-cancer diagnosis.