Fall Research Expo 2022

Tumor Cell Prediction at the Single-Cell Level in Pediatric High-Grade Gliomas

Pediatric high-grade gliomas, which begin in the glial cells of the Central Nervous System, account for 8-12% of all childhood brain tumors and can be extremely difficult to treat, given their propensity to metastasize quickly, invade brain tissue, place pressure on nearby tissue, and cause intracranial pressure and hydrocephalus. Two major types of high-grade gliomas are glioblastomas (GB - typically originate in astrocytes) and diffuse midline gliomas (DMG - typically originate in the middle of the brain, impacting the pons, thalamus, cerebellum, and spinal cord). One major challenge to accurately analyzing complex molecular interactions underlying these tumors is the vast heterogeneity of these cells. Single-cell RNA sequencing (scRNA-seq) enables this challenge to be overcome by allowing massively parallel profiling of thousands of cells at the single-cell level. This can then be combined with machine learning algorithms for a wide range of analytical purposes, including tumor classification. However, these computational pipelines are not typically designed for and implemented with scRNA data from complex brain tumors like gliomas. This project aimed to develop, implement and evaluate various classification algorithms (Linear Regression/LR, Linear Discriminant Analysis/LDA, Support Vector Machine/SVM, Random Foresting/RF, and k-Nearest Neighbor/kNN) to classify glioma cells using a computational pipeline based on  scRNA-Seq data. 

The algorithms developed were tested with various training paradigms, which included lab-generated glioma datasets and open-source non-glioma datasets (with 10-fold cross-validation), in addition to open-source glioma datasets (with 5-fold cross-validation). All pre-processing, classification, and analyses were conducted in R’s Seurat toolbox. Quality control was then performed by filtering out debris or unlabeled cells in addition to highly variable or absent genes followed by normalization, log-transformation and scale reduction using Principal Components Analysis. The algorithms were developed with the aid of various R-based toolboxes, including Garnett (LR), singleCellNet (RF), scID (LDA), scPred (SVM), and scmapcell (kNN), which were adapted to the needs of the glioma datasets. These networks were trained with a glioblastoma dataset using 10-fold cross-validation and with 2 lung cancer and 1 colorectal cancer dataset using 5-fold cross validation. The predictions generated by each algorithm were then utilized to perform Uniform Manifold Approximation (UMAP), which was used to perform clustering based on differential features to distinguish between tumor and non-tumor cells. The algorithms were used to generate confusion matrices and evaluated based on sensitivity, specificity, balanced accuracy, Receiver Operating Characteristic (ROC) curve, and tumor classification accuracy.

The most common distinguishing gene signatures between tumor and non-tumor cells were AC1, MES2, OPC4, and NPC3. The kNN and SVM algorithms typically had the greatest tumor detection accuracy of DMG and GB cells, respectively, although LR and RF algorithms showed the greatest detection accuracy when trained with non-glioma datasets. The classification algorithms showed the highest levels of sensitivity, specificity, and balanced accuracy when trained with lab-generated glioma data. However, the optimal Area-Under-the-Curve for the ROC curves were achieved when the algorithms were trained with open-source glioma datasets. This evidence also largely suggested that a combination of multiple algorithms is optimal for classification of high-grade glioma cells.

 

PRESENTED BY
PURM - Penn Undergraduate Research Mentoring Program
Advised By
Kristina Cole
Associate Professor of Pediatrics and Attending Physician
Kai Tan
Director of Pediatric Oncology Program, CHOP
PRESENTED BY
PURM - Penn Undergraduate Research Mentoring Program
Advised By
Kristina Cole
Associate Professor of Pediatrics and Attending Physician
Kai Tan
Director of Pediatric Oncology Program, CHOP

Comments