we should spend more time understanding why our models forget things, instead of trying to fix the current issues with patches without delving into the source of the problem.
It looks like we are on the same page
To be fair, while my interest lies in empowering models with the ability to learn “in the wild” in order to build general-purpose AI, there are other equally valid reasons to endow models with better continual learning skills.
If a hedge fund wants to predict the stock market in real-time, they might need to implement a good continual learning technique, but they definitely don’t need general-purpose AI. The hedge fund’s use case is where I expect approaches such as meta-learning to shine, with a well-defined pre-training phase and designed to perform well when future tasks resemble the training tasks. (I would argue that this setup is too rigid for in-the-wild ML, but I hope to be proven wrong).
I guess at some point we’ll have to design different benchmarks for each kind of continual learning. Like @vincenzo.lomonaco said, we are just getting started!
I don’t know how you implemented the concept of environment in your framework, but, IMO for learning continually a notion of reality is essential.
Within my framework, interacting with the environment is tantamount to subjectively interpreting the input. For example, if the input is an image and the NN’s output is the number of pixels in the image, that doesn’t count as interpretation. It still is a statement in the computational realm. Similarly, autoencoders don’t interpret anything. If, however, the input is an image and the output is cat/dog regardless of how the input distribution might change in the future, this counts as interpretation. Regression example: the age of a tree (based on its cross section) is an interpretation.
The model that maps internal representations to interpretations is making the connection between the computational world and the agent’s subjective reality (though you could say that it is the loss function that makes the connection). In future versions of MPCL, such a model could map representations to motor goals.
In MPCL version1, distinguishing between dogs and cats is the only task allowed to interfere with the internal representation of cats/dogs. Interference does happen if the model sees a new cat species for the first time. Yet the external concept of “cat” is immune to this kind of interference.
If the system connects cats to another concept, e.g. cat paws, MPCL won’t break the connected concept insofar as the connection between the two concepts (cat <-> cat paws) is a valid connection in the agent’s subjective reality. [Cat <-> cat paws] is valid in my reality, and honing the concept of cats will only improve the concept of cat paws in my reality. By contrast, if the system mistakenly connects cat representations to arctic foxes because the model has only seen white cats so far, then MPCL will catastrophically forget what arctic foxes are as soon as it learns about other species of cats.
At the end of the day, it’s just a simple building principle. It is not very fruitful on Permuted MNIST, but I have to start somewhere
Another point that I think you may find interesting is the one issued by this recent paper, where it is shown that many tasks do not suffer from catastrophic forgetting, which seems a problem mainly related to classification and related task.
Intriguing. I left a comment on the YouTube video a couple weeks ago (ContinualAI RG: "Does Continual Learning = Catastrophic Forgetting?" - YouTube).