Fall Research Expo 2020

Machine learning of EEG to help diagnose epilepsy: predicting functional connectivity from structural connectivity

I had the opportunity through PURM to participate in computational neurology research as part of the Center for Neuroengineering and Therapeutics. I worked with Dr. Kathryn Davis and Andy Revell on a project predicting brain functional connectivity from structural connectivity using brain network analysis and machine learning.  I focused on creating pipelines to automatically generate brain network features, plot their distributions, and organize the data structure into a feature matrix to fit a random forest pipeline. 

Throughout the summer, I learned about EEG, diffusion imaging, brain network analysis, data visualization, and machine learning, which I had previously known nothing about, but was interested in. I found the experience to be extremely rewarding. It not only helped me gain valuable research experience and improve my computational skills, but also gave me more confidence in my abilities and inspired me to continue exploring computational neuroscience. As someone interested in an interdisciplinary approach to using technology for societal impact, I really enjoyed the translational aspects of the lab. I hope to take what I have learned from this lab into future endeavors to better understand and combat society’s urgent problems. 

PRESENTED BY
PURM - Penn Undergraduate Research Mentoring Program
College of Arts & Sciences 2023
Join Lena for a virtual discussion
PRESENTED BY
PURM - Penn Undergraduate Research Mentoring Program
College of Arts & Sciences 2023

Comments

Why are random forests in machine learning a good approach, given there's no current way in deep learning to differentiate networks? Is this project pioneering a new direction/possible solution for your field? 

Yes, previous studies from other labs have shown it is possible to predict functional activity from brain structure; however, they predict fMRI not intracranial EEG recordings, use simple linear models, and only use specific network features. From previous work in our lab, we know that there is a non-linear relationship between these features and functional connectivity. Using simple linear models may not be appropriate or work for predicting iEEG, which has never been rigorously done before. Even though deep learning can do feature design for us, there is no current existing platform to do exactly what we are doing (predicting one network with another network). Since the two networks are inherently different measurements, we need a framework that could predict an incomplete network (due to the sampling limitations of SEEG) from a complete network generated from whole brain neuroimaging. As a result, we are instead using traditional machine learning approaches, like random forests; and therefore, have to focus on feature engineering, selecting the appropriate features and understanding their properties and distributions. 

I want to start off by saying your poster is very aesthetically pleasing! Could you elaborate a bit more on what the preliminary results mean (e.g., the differences between normal and skewed distributions and what the differences between patients mean)? Thanks!

Hello! I thought your poster was very nicely designed. I appreciated how you zoomed in on the relevant sections during your presentation. Your study is very interesting.

Hello! I thought your poster was very nicely designed. I appreciated how you zoomed in on the relevant sections during your presentation. Your study is very interesting.

This is very interesting, Lena! It surely seems very promising that a random forest model may be able to outperform our current methods for epilepsy diagnosis and evaluation. How well does this model actually function on the datasets that you had? Did you find that the correlations you saw held up in patient data? I saw that you mentioned that you may be having overfitting problems--could you elaborate further on this?

Hi Lena! This sounds like such an interesting study! Exciting to see how it could be implemented to affect patient outcomes. 

I thought your project was really interesting! You said in the future you'll be determining which network features are most important, but if you had to guess, what would they be?

Thank you so much for your kind words!

In answer to the question of what the preliminary results mean, they are from the pipeline I created to automatically generate distribution plots for a variety of brain network measures averaging across all the patients. Understanding these underlying distributions allows us to more easily generate a feature matrix for the random forest pipeline, and easily select features we want to use. Previous work in our lab has shown that certain atlases are better for correlating brain structure and function, so with that in mind we looked at a variety of different atlases. Whether certain features had normal and skewed distributions across atlases allowed us to see how equally network features were distributed for different atlases for all the patients. For example, clustering coefficient for all atlases are skewed and correspond equally for all atlases, so it might be a a better network feature to test across all atlases. The importance of the highlighting the differences between patients was to show that we need a model that can adjust to patient differences because their brain structures and therefore brain network features are not the same. 

In answer to the overfitting problem, we are testing it out now on a subset of brain network features (degree, clustering coefficients, betweenness centrality, and shortest path length) and then adding in all of the brain network features calculated. Once we add in all the other features, there is a potential for overfitting, which we hope to resolve by regularization, using both SciKit-learn's regularization parameters and GridSearchCV to tune these hyperparameters. 

If I had to guess which network features were most important, I would say there is some evidence and previous research from other labs that have shown communication networks (such as shortest path length, characteristic path length, search information, and path transitivity) may be the best indicators of the correlation between brain structure and function. 

Please let me know if you have any more questions!