When interference is okay and when it's not

Hi everybody,

Surely I’m not the only one who feels that we are missing something about catastrophic forgetting.

Most papers focus on mitigating catastrophic forgetting. This is very useful, so I’m not questioning what motivates these papers. Is it going to scale in a long term / lifelong learning setting, though? What do you think?
If we follow the mitigation path, are we ever going to reach a level at which catastrophic forgetting is no longer the main issue?

In my humble opinion, lifelong learning is in desperate need of a recipe, or at least some general construction principles.
I’m really curious about what you think these could be.

I am sharing some of my thoughts in github.com/rom1mouret/mpcl

It’s not fully fleshed out. I don’t think I will have time to continue exploring those ideas alone this year, but I’m open to feedback and collaboration.

follow-up project: GitHub - rom1mouret/forgetful-networks: towards models that can gracefully forget

It’s another attempt at dealing with catastrophic forgetting off the beaten path.

I start with a broad-brush take on the difference between healthy forgetting and catastrophic forgetting and go on to suggest that forgetting would be healthier if it were to mechanically erode the system’s trust in the components of the system that have forgotten things.

Hi @Morty!

Your suggestion is indeed correct. The focus on mitigating forgetting is just a minor aspect of CL. We are just getting started. :slight_smile:

That’s a very interesting take! We wondered about the same thing in a recent work, where we propose a new framework to unify different views on lifelong learning. There is a whole field of work that focuses instead on ‘concept drift’, which is something we typically don’t consider in continual learning. The objective of ‘concept drift’ is rather to forget in a smart fashion whenever distributions of concepts drift, whereas continual learning tries to avoid forgetting in general without considering these distribution shifts.

You can find our learner-evaluator framework here: https://arxiv.org/pdf/2009.00919.pdf

1 Like

@vincenzo.lomonaco @mattdl thank you for your encouragement :wink:

I had never thought of concept drift and interference as two related ideas, but now I see how they can be considered as two sides of the same coin.

By the way, I’m in the process of reframing MPCL from an embodied/situated cognition perspective.

“MPCL posits that latent representations acquire meaning by acting on the outside world.
For continual learning to be manageable in complex environments and avoid catastrophic forgetting, meaning must remain stable over time.”

Also added a FAQ.

2 Likes

Hi @Morty,
I totally agree with your points here, I also think that continual learning is in desperate need of a general and consolidated approach to deal with forgetting, concept drift, etc.
I need to read more carefully the thoughts you shared on Github, but I really like that you mention embodiment and actions on the outside world. I haven’t formalized my thoughts like you yet, but I think one of the most important non-broadly-explored-yet concepts in continual learning is the interaction with the environment. In fact, is even still not explored in the classical machine learning community!
I don’t know how you implemented the concept of environment in your framework, but, IMO for learning continually a notion of reality is essential. In our world, every object must obey the law of physics. This is a really strong assumption since we implicitly know that every new object we perceive must behave in a certain manner. We have a lot more prior knowledge about unseen objects than we are aware of. I doubt that if we couldn’t generalize those basic properties of our world we would be able to learn new concepts so easily.

Another point that I think you may find interesting is the one issued by this recent paper, where it is shown that many tasks do not suffer from catastrophic forgetting, which seems a problem mainly related to classification and related task.
There are also many works that point out that the problem of forgetting (especially in NN) is caused by the learning algorithm (stochastic gradient descent optimization) and other optimization algorithms could highly mitigate the forgetting problem.
So, as you pointed out, it’s not only a problem of forgetting but I think we should spend more time understanding why our models forget things, instead of trying to fix the current issues with patches without delving into the source of the problem.

3 Likes

Hi @ggraffieti

we should spend more time understanding why our models forget things, instead of trying to fix the current issues with patches without delving into the source of the problem.

It looks like we are on the same page :slight_smile:

To be fair, while my interest lies in empowering models with the ability to learn “in the wild” in order to build general-purpose AI, there are other equally valid reasons to endow models with better continual learning skills.

If a hedge fund wants to predict the stock market in real-time, they might need to implement a good continual learning technique, but they definitely don’t need general-purpose AI. The hedge fund’s use case is where I expect approaches such as meta-learning to shine, with a well-defined pre-training phase and designed to perform well when future tasks resemble the training tasks. (I would argue that this setup is too rigid for in-the-wild ML, but I hope to be proven wrong).

I guess at some point we’ll have to design different benchmarks for each kind of continual learning. Like @vincenzo.lomonaco said, we are just getting started!

I don’t know how you implemented the concept of environment in your framework, but, IMO for learning continually a notion of reality is essential.

Within my framework, interacting with the environment is tantamount to subjectively interpreting the input. For example, if the input is an image and the NN’s output is the number of pixels in the image, that doesn’t count as interpretation. It still is a statement in the computational realm. Similarly, autoencoders don’t interpret anything. If, however, the input is an image and the output is cat/dog regardless of how the input distribution might change in the future, this counts as interpretation. Regression example: the age of a tree (based on its cross section) is an interpretation.
The model that maps internal representations to interpretations is making the connection between the computational world and the agent’s subjective reality (though you could say that it is the loss function that makes the connection). In future versions of MPCL, such a model could map representations to motor goals.
In MPCL version1, distinguishing between dogs and cats is the only task allowed to interfere with the internal representation of cats/dogs. Interference does happen if the model sees a new cat species for the first time. Yet the external concept of “cat” is immune to this kind of interference.

If the system connects cats to another concept, e.g. cat paws, MPCL won’t break the connected concept insofar as the connection between the two concepts (cat <-> cat paws) is a valid connection in the agent’s subjective reality. [Cat <-> cat paws] is valid in my reality, and honing the concept of cats will only improve the concept of cat paws in my reality. By contrast, if the system mistakenly connects cat representations to arctic foxes because the model has only seen white cats so far, then MPCL will catastrophically forget what arctic foxes are as soon as it learns about other species of cats.

At the end of the day, it’s just a simple building principle. It is not very fruitful on Permuted MNIST, but I have to start somewhere :slight_smile:

Another point that I think you may find interesting is the one issued by this recent paper, where it is shown that many tasks do not suffer from catastrophic forgetting, which seems a problem mainly related to classification and related task.

Intriguing. I left a comment on the YouTube video a couple weeks ago (ContinualAI RG: "Does Continual Learning = Catastrophic Forgetting?" - YouTube).

Yes. Numenta’s work is a good example. They occasionally review continual learning papers on their youtube channel.

There’s room for less destructive optimization algorithms. For instance, nudging one weight at a time isn’t performing too bad as far as I can remember, and not as slow as people might think. But if the objective function is to do well on the task at hand, and only on the task at hand, we can’t blame the optimization algorithm for reaching the objective. Without replay or ensembles, there must be something in the objective designed to mitigate forgetting. I don’t think the optimization algorithm can do all the heavy lifting.