Hello, I have a possibly naive question about class incremental learning where each new task corresponds to a single new class. If we assume that we know the number of classes beforehand e.g. 3 and want to train a neural network classifier, would the following approach be correct?
- Define a model with 3 output nodes.
- Train model on only 1st class data.
- Train on 2nd class class data using some continual learning algorithm like EWC.
- Finally train on 3rd class data using EWC.
I feel this probably isn’t the way to go, as training on a single class would teach the model to just output the highest number in the corresponding output node, without necessarily capturing any useful features of the data.
What might a suitable approach for such a problem be? Should one wait until data from two classes becomes available and then use these two as the first class? But then what happens when there is only one class left for the final task?
Many thanks!
Hi,
First of all, I don’t view your question as naive and indeed, it can be tricky to deal with such strict scenarios regardless of how realistic they might be (this you would need to evaluate whether it’s worth focusing on it or not).
As you already mentioned, it’s very likely that your model would learn a trivial solution (i.e. useless features) if you would train it for the classification of data from a single class. Obviously, if the dataset in each experience is small, waiting for data from more classes to become available and then training the model on the combination of the datasets would be a straightforward solution. But assuming that you want to stick to the strict setup (and not store data from past experiences) and train the model from scratch, my understanding is that you would need a way to encourage your model to learn better features, especially in the first experience. Below I’ve listed two possible approaches that may or may not be effective, but they might be helpful depending on the details of your evaluation setup:
1- Using sample reconstruction loss in addition to the classification loss: you could add a decoder on top of the penultimate layer of your network to reconstruct the input samples in an auto-encoding fashion. In that case, your model would have a decoder in addition to the classification head that encourages it to learn more useful features even when all data come from a single class.
2- Using an auxiliary class when random “external” data is available: assuming that you have access to another stream of (unlabeled) data, ideally from the same distribution, you could add an additional output neuron such as the “unknown” category, and use it to enforce “binary” classification in the first experience. Another way would be to look at it as a logistic regression problem per class instead of adding a new output neuron.
Once again, these are only potential approaches that might help solve the trivial solution issue in strict class incremental streams. However AFAIK the EWC method works well with the domain-incremental scenario (as demonstrated in the original paper) and not necessarily with the CL scenario. You may need to use a different/adapted method to prevent forgetting.
Final note: I recall that there was a work on CL that would train the model one single class at a time, but unfortunately I can’t remember the title.
Hi @Hamed,
Thank you so much for your reply. My datasets for each task are is not very big, so the most straightforward thing to do, would probably be to wait for a second class to become available and train on pairs of classes. Thank you for pointing out that EWC might not be the best fit for class incremental scenarios - I just had a look at some papers again and see that it does not perform as well for benchmarks like split MNIST for instance ( https://arxiv.org/pdf/1805.09733.pdf ).
Also, using an auto-encoder to enforce learning of more meaningful features is a very interesting idea.
Many thanks,
Katerina