Fall Research Expo 2023

Task Aware Representations of Sentences Improves Seizure Freedom Classification in Data-Limited Settings

 

Automated classification can aid healthcare professionals in efficiently identifying epilepsy cases, potentially reducing misdiagnoses and enhancing overall medical decision-making. Some time ago, Kevin Xie developed a natural language processing model to extract seizure frequency from three types of clinician notes: Epileptologists, Neurologists, and General Clinicians. The model was exclusively trained on epileptologist data and validated across all three datasets. Mean accuracy was high in epileptologist notes—the dataset it was trained on—and performance dropped on untrained validation data. This likely stems from underlying disparities in note-taking formats and conventions that vary for each specific clinical practice.

To address this problem of generalization, we explored TARS. TARS diverges from conventional classifiers by reframing its approach from "What is the optimal class for this passage?" to "Does this passage-class pairing make sense?" This shift enables TARS to extract latent semantic information from the class labels themselves, enhancing classification capabilities in scenarios with limited data. We first trained a base model using purely epileptologist data, aiming to improve accuracy on unseen data—specifically, neurologist and generalist notes. In the zero shot case, or K = 0, we achieve similar accuracy in classifying epileptologist notes as before and improved accuracy on neurologist and generalist notes. As K increases, there is improvement in classifying the neurologist notes as the number of target class samples increases, but there is no increase in accuracy with the introduction of more general samples.

Plotted against the null distribution, which we generated by randomly shuffling the data labels around and testing the model, we see significant differences as well.  The model was best at identifying whether or not the patient has seizures and struggles when seizure freedom is uncertain. But this might be the result of inherent class imbalance in the training data set. We improved classification accuracy by ~15/~10% in the zero-shot and one-shot scenario for neurologist and generalist notes, respectively. Generalizing classification is crucial for epilepsy classification. By achieving generalization, our classifier maintains high performance levels even when faced with previously unseen data. 

PRESENTED BY
PURM - Penn Undergraduate Research Mentoring Program
Engineering & Applied Sciences 2026
Advised By
Kevin Xie
William Ojemann
Colin A. Ellis
Assistant Professor, Neurology
Brian Litt
Director, Center for Neuroengineering and Therapeutics
PRESENTED BY
PURM - Penn Undergraduate Research Mentoring Program
Engineering & Applied Sciences 2026
Advised By
Kevin Xie
William Ojemann
Colin A. Ellis
Assistant Professor, Neurology
Brian Litt
Director, Center for Neuroengineering and Therapeutics

Comments