Fall Research Expo 2020

LCS-DIVE: Detecting and Characterizing Signal in Complex Datasets

View Poster

While machine learning has proven itself to be a powerful tool in solving problems with high amounts of complexity (e.g heterogeneity, epistasis, noise), ML model interpretation is a largely unsolved problem. Model interpretability is crucial in fields such as biomedicine and epidemiology, where inherent noise makes perfect predictive accuracy impossible, as well as in fields where understanding why certain predictions were made is key to ensuring unbiased outcomes.

In this poster, we present LCS-DIVE (LCS Discovery and Visualization Environment) as a general, automated method for any data researcher to discover and visualize the signal in a wide variety of complex classification problems. Namely, LCS-DIVE is able to extract and display human interpretable information about instance heterogeneity, epistatic patterns of association, and feature importance from the data for further exploration and knowledge discovery. In addition, LCS-DIVE presents a novel method for LCS rule population analysis that can be adapted for the better interpretation of other LCS algorithms.

We tested the efficacy of LCS-DIVE against a set of simulated datasets that contain various configurations of complexity (including the MUX problems up to 70 bits). Finally, we ran LCS-DIVE on a real-world pancreatic cancer dataset.