Demonstrating Catastrophic Forgetting in Traditional ML models

I am clear about the definition of Continual / Lifelong Learning which is one of the methods to alleviate Catastrophic Forgetting and Three Scenarios of it. In every paper, I searched for proof of how the models are evaluated in a sequence of tasks (At test time which tasks examples are being provided and how exactly they test), the authors describe that Neural Nets / Traditional ML models such as RandomForests, Decision Trees are prone to forgetting, i.e, when they are trained on a sequence of tasks (The data is only available when training on that particular task), they only perform better on the most recent task forgetting the previous tasks. But how exactly the traditional ML models can be trained without having the whole data at Once? In the case of Neural Networks, we might say that the weights will serve as initialization and the final layer will be modified while learning new tasks (As the number of classes is increasing). But how to test the Catastrophic Forgetting Scenario in Traditional Methods such as Decision Trees or SVMs without training on Whole the data.
Won’t the traditional ML models completely Forget the previous tasks they are trained for?
For example, let there are two tasks and in each task, we are training a binary classifier. be {1,2}, {3,4}. In traditional Methods won’t the classifier completely Forget {1,2} classes when we sequentially train on the first and second tasks as we can’t just do something like changing the last layer as in the Neural Network case. Is the way I’m thinking right? If not how do we justify Catastrophic Forgetting in Traditional ML models.
I am looking for the implementation level details of demonstrating Catastrophic forgetting in both Neural nets (I’m still not sure about the way they are tested for demonstrating Catastrophic forgetting) and Traditional ML models. Any resources involving these would be very much helpful.

I don’t know about SVMs, but you can have random forests remember old tasks while training new tasks.

Either:

  • update the number of samples per class at the leaf level

  • train new trees on the new task and remove some old trees. You end up with a forest that sort of works for multiple tasks. Of course, that doesn’t work well.

There might be other ways that I am not aware of.

Hello,
Thanks for the reply. I’m still unclear about the way the CL models are evaluated during test time while training in a sequence of tasks. Lets say we are training a task t in many implementations such as this and this I have seen that in the validation set, they are taking examples only from the task t not sampling from all the tasks up to t, but ideally as the model learns the task t, it would anyway give high accuracy on this task. But it forgets previous tasks. Shouldn’t be the implementations for validation involve examples up to task t rather than taking only from task t. Some references to implementation of this method might be helpful.

There is a variety of evaluation settings. See the table in this README

What you are describing is the most realistic scenario, and this is how I evaluate my CL models.
That being said, task switching and multi-task learning can be viewed as two separate problems, and providing task labels at test time is a valid way of evaluating the latter.

I haven’t read the code in detail but it seems that what they call “class scenario” does what you say (line 83). Doesn’t it?

Hi, Thanks for the reply.
But shouldn’t it be the way that irrespective of the Scenario of CL we are on, the validation should contain the examples up to task t ? Isn’t there any standard way of evaluation irrespective of the Scenario?

Can you please give some references where I can learn the clear distinctions and their evaluation protocols regarding these?

Our code for random forest and deep learn continual learning is here: http://proglearn.neurodata.io/
Our evaluation protocols are described here: [2004.12908] Omnidirectional Transfer for Quasilinear Lifelong Learning

Hope that helps!

You are correct.

In the general case*, you should be measuring the effect of interfering with what has been learned from the previous tasks, even when task labels are known at test time.

In each of the three scenarios (“class”, “task”, “domain”), line 207 of evaluate.py does iterate over every task seen so far.

So, in the “task” scenario, the model is indeed evaluated on tasks that may have been forgotten.

Unlike the “class” scenario, the “task” scenario takes the argmax between the classes of the selected task and ignores other tasks (line 52), which is another way of saying that the task label is known.

I haven’t executed evaluate.py, and maybe you have, so take what I am saying about evaluate.py with a grain of salt.

*In some special cases, you don’t have to test the model on previous tasks because you know that you didn’t interfere with the previous models. For example, each time you add a task, you copy the previous task’s model with a regularization factor that penalizes large displacements of the weights.

1 Like