Developing a Comprehensive R/Bioconductor Package for DNA Methylation-based Human Cancer Classification and Phenotyping
Cancer DNA methylome encodes rich and detailed molecular footprints of a tumor’s cell of origin and oncogenic mechanism. Machine learning models have proven successful in predicting tumors of the central nervous system. We developed an R/Bioconductor package, CytoMethIC (Cytosine Methylation Intelligence) to provide an open-source solution for comprehensive human cancer phenotyping, encompassing automated determination of cancer type, subtype, and clinical oncology attributes such as tumor stage, cell of origin, aneuploidy, sample purity, race and sex. Our package encapsulates six different machine learning frameworks: random forest, support vector machine, multilayer perceptron, extreme gradient boosting, k-nearest neighbor, and Naïve-Bayes. Each framework utilizes optimal selection techniques and is tested to predict 33 human cancer types (91 subtypes) in The Cancer Genome Atlas cohort and 66 brain cancer types (82 subtypes) in the Children’s Brain Tumor Network cohort. These datasets are profiled using different generations of Infinium BeadChip platforms. We evaluated the models for accuracy, confidence, interpretability, algorithm runtime and model storage. Our user-friendly, standard-compliant informatics facilitate the use of machine learning models for DNA methylation-based cancer classification in clinical diagnosis.
Comments