Fall Research Expo 2023

Developing a Comprehensive R/Bioconductor Package for DNA Methylation-based Human Cancer Classification and Phenotyping 

Cancer DNA methylome encodes rich and detailed molecular footprints of a tumor’s cell of origin and oncogenic mechanism. Machine learning models have proven successful in predicting tumors of the central nervous system. We developed an R/Bioconductor package, CytoMethIC (Cytosine Methylation Intelligence) to provide an open-source solution for comprehensive human cancer phenotyping, encompassing automated determination of cancer type, subtype, and clinical oncology attributes such as tumor stage, cell of origin, aneuploidy, sample purity, race and sex. Our package encapsulates six different machine learning frameworks: random forest, support vector machine, multilayer perceptron, extreme gradient boosting, k-nearest neighbor, and Naïve-Bayes. Each framework utilizes optimal selection techniques and is tested to predict 33 human cancer types (91 subtypes) in The Cancer Genome Atlas cohort and 66 brain cancer types (82 subtypes) in the Children’s Brain Tumor Network cohort. These datasets are profiled using different generations of Infinium BeadChip platforms. We evaluated the models for accuracy, confidence, interpretability, algorithm runtime and model storage. Our user-friendly, standard-compliant informatics facilitate the use of machine learning models for DNA methylation-based cancer classification in clinical diagnosis. 

PRESENTED BY
PURM - Penn Undergraduate Research Mentoring Program
Advised By
Wanding Zhou
Assistant Professor of Pathology and Laboratory Medicine
PRESENTED BY
PURM - Penn Undergraduate Research Mentoring Program
Advised By
Wanding Zhou
Assistant Professor of Pathology and Laboratory Medicine

Comments