On the Practical Utility of Continual Learning

Just to add a bit to the discussion, IMO nowadays the most prominent benefit of continual learning is efficiency. Memory is hardly ever a constraint, but efficiency often is. As an example, suppose you have almost infinite memory, and every day you collect A LOT of data and you want your ML model always updated with the new data you collect. Retraining the model with all the current and past data will become impossible very soon, especially if you want to deploy the new model as soon as possible.

1 Like

Not much work has been done on combining fixed controller neural networks with external memory. A little bit of work has been done on Neural Turing Machines.
I have tried providing a neural network with external associative memory but that is computationally expensive. I might try some cheaper options like providing recurrent memory with a Lagged Fibonacci random number generator type system. Where the + operator is replaced with a neural network!
https://en.wikipedia.org/wiki/Lagged_Fibonacci_generator

1 Like

Hi @SeanC4S, if you are interested you can look at this this paper which combines a RNN with a growing external memory on NLP task for CL.

Thanks for the link, I’ll read the paper.
Back-propagation and evolution have difficulty learning how to use external memory banks. Also biological brains can access huge amounts of memory concurrently (Gigabytes per ms I guess) but digital computers are more limited.
It may be necessary to have summary memory giving some clues what is in deeper memory with digital computers. Allowing the neural network to decide if it wants to dig deeper into some part of memory.

Okay, I read it. I also found this paper on “Learning Efficient Algorithms with Hierarchical Attentive Memory.” https://arxiv.org/abs/1602.03218

My problem is I have a extremely fast type of neural network that I want to add an external memory bank to. I do have a reasonably fast associative memory algorithm I could use. However it is about 10 to 20 times slower than the neural network. That is too much of a mismatch.
I guess I better use a small amount of memory that can fit in the L1 or L2 cache of a CPU or GPU (for speed.) And have some algorithm the neural network can use to efficiently compress information into that memory.

If you feedback the output of an invertible random projection back to its input you get a kind of complicated resonance/oscillator system that does not lose information. You could use some kind of threshold system to allow the neural network to inject information into that resonance/oscillator system when it wanted to.