Mining Epilepsy Diagnosis Information from EMU Notes, with NLP
Patients typically visit the Epilepsy Monitoring Unit (EMU) to determine whether the events they experience are epileptic seizures, and if so, which part of the brain the seizures are coming from, for presurgical evaluation. When a patient is admitted to the EMU, physicians attempt to induce seizures and observe on EEG any abnormal brain activity that occurs during their events.
This project focuses on using Natural Language Processing, or NLP, to train a machine learning model to analyze EMU patient notes and answer questions about each patient’s diagnosis, including characteristics of their epilepsy. Our objective is to answer these questions for each patient note:
- Did the patient have an epileptic seizure during this visit? (Yes/No/Unknown)
- Did the patient have non-epileptic events during this visit? (Yes/No/Unknown)
- Did the author conclude that the patient may have epilepsy? (Certain/Possible/Unlikely/Indeterminate)
- What is the classification of the patient’s epilepsy? (Focal/Generalized/Mixed/Unknown)
- What is the lateralization of the patient’s epileptic seizures? (Left/Right/Both/Unknown)
- What is the localization of the patient’s epileptic seizures? (Frontal/Temporal/Both/Other/Unknown)
To train the model, we annotated 124 patient notes (later adding 545 additional labeled notes) with adjudicated human answers to the six questions for each note, as ground-truth labels for the model to learn from. We input this training data into Hugging Face’s Auto Model for Sequence Classification, which uses ClinicalBERT, a transformer model for text classification, pre-trained on medical notes, that utilizes a question-answering approach behind the scenes, allowing a single model to perform all six independent classification tasks. To balance the training dataset and optimize our model’s accuracy, we made 2 modifications to our training data:
- Two rounds of condensing less common labels (answers), combining smaller categories to improve the model’s likelihood of correctly classifying notes with less common labels
- Oversampling, or duplicating notes with less common labels; future improvements may explore different data augmentation methods as well.
After running the model with various combinations of these modifications, we found that the model with best overall performance was the one trained on the dataset with 545 additional notes after two rounds of condensing but no oversampling, with slightly higher accuracy than the one with oversampling. Especially for the last 3 questions, which had more possible answers, expanding the training dataset by 545 notes significantly improved our model’s accuracy.
Applying this best model to our entire dataset of over 2600 patient notes, we calculated the distribution of predicted answers to each of our six questions, across all notes. We also extracted the times of each patient’s seizures in the EMU, plotting these alongside their medication dosages over the course of their visit, to find correlations between successful seizure induction and factors like medication concentration at the time of first seizure, or medication tapering speed, or even the administration of alcohol as an induction procedure. Future research into such correlations may find this data extremely helpful for optimizing seizure induction and treatment procedures for EMU patients.
Comments