Fall Research Expo 2023

A Meta-Learned Neural Network Model of the Hippocampus with Brain-like Sparsity and Learning Rate Patterns

View Poster

The human brain is the greenest computer on Earth! It is incredibly efficient at transmitting and storing information, running on only 20 watts of power to perform an estimated exaflop (10 to the power of 18) of operations per second. In contrast, deep neural networks consume copious amounts of energy to futilely execute at slower rates. Parallel processing and sparse representations – in which a small number of neurons respond to stimuli – support efficient computations in the brain, especially the multiple learning systems that manage the specifics of and regularities across experiences. In the hippocampus, the monosynaptic pathway (MSP) oversees slower statistical learning, while the trisynaptic pathway (TSP) is responsible for rapidly learning individual episodes. As a result, the two pathways have various learning speeds and sparsity. However, it is unclear how the brain arrives at these in the first place! To elucidate a normative account of these complementary learning systems, we propose a neural network model of these pathways that meta-learns or “learns to learn” the hyperparameters LR and m, which respectively modulate per-layer learning rate and inhibition through a k-winners-take-all approach.

We start by basing code off of the La-MAML repo, a continual online meta-learning model with a small replay buffer of past examples trained on 20 MNIST rotation tasks – each a 10-way classification between the digits 0-9. To induce sparsity, we then implement a version of k-winners-take-all, meta-learning the continuous parameter m which multiplies a threshold activation I – and thus determining the top k most active neurons per layer. We also meta-learn the per-layer learning rate LR to induce various learning speeds between layers. Finally, we vary the architecture of the model (either two or four basic hidden layers, or two additional four-layer two-pathway options, skip and split two-path) and test the model on the MNIST rotations dataset, plotting learning rate and sparsity over the 20 tasks. In the skip two-path model, the first pathway runs through a large intermediate third layer h3_large, while the second pathway skips this layer and jumps directly from the second to fourth. On the other hand, in the split two-path model, both pathways use a third hidden layer, but the first exploits a large layer while the second employs a regular-sized one.

Results first indicate that (fortunately) and perhaps unsurprisingly, meta-learning these hyperparameters benefits ANN performance, as optimal parameters are used in evaluating performance. In the skip two-path model, the large intermediate layer becomes sparser over all tasks and finishes as the sparsest layer. In addition, the learning rate between h3_large and h4 is highest when compared to all other hidden layer connections. The split two-path model has similar patterns; the large intermediate layer exhibits a higher level of sparsity than its regular-sized counterpart, becomes increasingly sparse over tasks, and learns more rapidly through meta-learning. In both models, the two distinct hippocampal-cortical pathways MSP and TSP mimic the brain: their meta-learned learning rates and sparsity corroborate observed properties! Through the future steps of testing the model on the real-life CIFAR dataset and investigating sparsity and learning rate in other popular neural networks (CNNs, transformers), we will gain an even deeper normative account of how the multiple learning systems in the brain arise. Ultimately, this work provides the valuable framework for meta-learning neural architectures in the brain, including but not limited to visual and auditory processing systems, spatial navigation systems, and hippocampal-cortical learning during sleep.