Fall Research Expo 2023

Improving the Generalizability of Natural Language Processing Algorithms in Medicine

 

The Electronic Health Record (EHR) is import data holds clinical information taken from the raw text of clinic notes written by healthcare providers. Natural Language Processing (NLP) can be used to extract information out of this unstructured data. However, these clinical notes are vastly different: differing writing styles, medical jargon per specialty, and form. We explored the efficacy of novel similarity-based techniques in extracting insights from unstructured clinical notes. The study compared these techniques to standard classification methods, particularly focusing on their generalizability to notes from non-epileptologists. Using a dataset of clinical notes from epileptologists, neurologists, and generalists, four models were employed, including the standard Lbl2Vec and three transformer-based techniques. The models were trained on epileptologist notes and tested on three different sets of notes. The process involved converting text and associated keywords into numerical embeddings and classifying documents based on their cosine similarity with label embeddings. The model's job was to classify each patient from their clinical notes as either "Seizure Free", "Has seizures", or "Could not classify". Results indicated that while epileptologist notes had the highest accuracy, similarity-based techniques underperformed compared to standard classification, with accuracies not surpassing 50%. A notable observation was the models' inability to predict any "Unclassified" patients, suggesting the need for more specific label keywords. Overall, standard NLP classification techniques are a good way to glean insights from clinical notes. 

PRESENTED BY
PURM - Penn Undergraduate Research Mentoring Program
Wharton, Engineering & Applied Sciences 2026
Advised By
Colin Ellis
Assistant Professor of Neurology at the Hospital of the University of Pennsylvania
PRESENTED BY
PURM - Penn Undergraduate Research Mentoring Program
Wharton, Engineering & Applied Sciences 2026
Advised By
Colin Ellis
Assistant Professor of Neurology at the Hospital of the University of Pennsylvania

Comments