Using Pathomic Imaging Data to Predict Histological Classifications of Pediatric Medulloblastoma
Medulloblastoma (MB) is one of the most common types of malignant brain tumors in children. The current guidelines for the classification of MB involve an integrated diagnosis of both its histological and genetically-defined subtypes, and in the age of precision medicine, treatments are increasingly tailored based on the tumor classification. Radiological and histological images, which are commonly collected in the clinical care process for pediatric MB, contain immense amounts of extractable data (radiomics; pathomics). These radiomic and pathomic data, especially when integrated into a multi-modal dataset, have promise in containing predictive value for classifying MB. Thus, there is an exciting opportunity to utilize artificial intelligence (AI) and machine learning (ML) to harness these data to perform predictive forecasting and aid in the clinical decision-making process for pediatric patients with MB. Additionally, previous studies have found success using multi-omic data in conjunction with machine learning algorithms in neuro-oncology, but research is limited particularly in the pediatric population, so we hoped to add to this gap in literature.
In this project, we trained a machine learning model on pathomic data and evaluated its effectiveness in predicting histological subtypes of pediatric MB. We used pathology data (n = 203) from the Children’s Brain Tumor Network (CBTN), which are whole slide images of the tumors. After splitting these images into tiles, we selected those with the least noise and background and used QuPath’s StarDist extension to algorithmically segment the nuclei in the tiles. Next, feature extraction was performed to represent the nuclei in the tiles as values of different features such as shape and distribution. From those features, we selected the 60 most informative features based on their statistical significance. Employing common Python libraries, we used a support vector machine model to classify the subtypes (LCA, classic, desmoplastic) and tuned the model’s parameters with a GridSearchCV, testing and averaging across a five fold cross validation. With the inclusion of an Edited Nearest Neighbor approach to undersampling the entire dataset (reducing the number of samples from the non-minority classes, which were classic and LCA), the model produced an average AUC of 0.84. This was significantly higher than the control of no resampling performed (0.50 AUC). However, performing resampling on all of the data lacks generalizability, so to avoid this, we hope to replicate this method by manually removing more uninformative tiles before model training to reduce noise and increase accuracy. Further work will attempt to predict genomic classifications to explore the underlying biological characteristics of these images. We also plan to integrate the use of radiomic data (MRI), as previous studies have demonstrated, to improve the success of and evaluate the performance of a combined radio-pathomic model.
Comments