Fall Research Expo 2022

DELongSeq: An Efficient Isoform Expression Detection Method for long-read RNA Sequencing Data

This project explores DELongSeq, a method that employs statistical techniques to make the most optimal isoform expression estimates on long-read RNA-Sequencing reads. RNA Sequencing tells us which genes and isoforms are active and how much each splicing isoform is transcribed. Next-generation sequencing can either be short-read, which fragments the DNA, sequences it and ligates it back together, and long-read, which analyzes the full-length transcript without having to reassemble. Current short-read and long-read sequencing methods use read counts (amount of reads in the alignment phase) to estimate expression levels. Current long-read methods isolate novel genes and can create estimates, but still have uncertainty in expression estimation due to variability in precision across samples. DELongSeq both accounts for uncertainty in isoform expression estimation and variation in precision of expression estimation across biological replicates, allowing for covariance. For optimal results, DELongSeq uses biostatistical techniques, employing the information matrix of the Expectation-Maximization (EM) Algorithm. DELongSeq quantifies the uncertainty of isoform expression estimates, performing maximum likelihood estimation in the presence of latent variables (not directly observed but inferred), as well as a random-effects regression model to account for variable uncertainty. Through running DELongSeq on various sets of both real and simulated data, we show that DELongSeq’s approach is computationally reliable, and can improve the power of DE analysis in isoforms and genes. It also allows for adjustment of covariates and accounts for variation across samples. DELongSeq results showed that as sample size decreased, the power stayed relatively stable for both DE and non-DE samples. Finally, the method allows for 1 case vs. 1 control comparisons. In summary, DELongSeq can lead to efficient detection of differential isoform/gene expression from long-read RNA-Seq data.

PRESENTED BY
PURM - Penn Undergraduate Research Mentoring Program
College of Arts & Sciences 2025
Advised By
Kai Wang
Professor of Pathology & Laboratory Medicine, Perelman School of Medicine
PRESENTED BY
PURM - Penn Undergraduate Research Mentoring Program
College of Arts & Sciences 2025
Advised By
Kai Wang
Professor of Pathology & Laboratory Medicine, Perelman School of Medicine

Comments