@vlomonaco: Hi @MassimoCaccia! this seems very interesting and aligned with our vision as well!
I’ll give you feedback by the end of today!
@MassimoCaccia: thanks a lot Vincenzo!
I’m sure some ideas are not new, so if im missign some citations, please let me know!
@Martin_Mundt: Hey, the paper looks really great in terms of the direction it’s going. I am a big fan of moving away from the super constrained scenarios people have been practising up until now. I think I will have to read it again in order to give more concise feedback. But from a first read-through it seems that, while you are mentioning and somewhat analyzing out-of-distribution scenarios, you are pretty much missing any and all references and potential comparison.
We had explicitly looked at evaluation of continual learning in open world scenarios in our work: https://arxiv.org/pdf/1905.12019.pdf
If you take a look at all the references there in section 2.2 you can see that there exist a variety of other approaches. In particular I think that most existing probabilistic approaches (such as Bayesian approaches: variational continual learning, variational generative replay, our own paper that extends these) and approaches that are based on geometric properties have natural means to think outside “pure catastrophic forgetting in offline scenarios”. Most of these have things such as uncertainty estimation or data distribution approximation built in and thus are naturally more inclined towards online scenarios, open world learning etc. I think it would be really beneficial to your paper and the community in general, if you could go into more detail there, at least cite a larger body of prior work, or even better explicitly compare to some approaches that introduced the methods but didnt evaluate them in your precise context (instead of just not comparing to them because their initial evaluation was different/not appropriate to your context).
While it’s very important to bring new evaluation scenarios and other ideas into continual learning (such as OOD detection and meta-learning suitable evaluation), I think one should do more justice to all prior work and point out how one builds on top.
In your specific case, I believe this would be more references with respect to online learning and out-of-distribution detection in general, even if they are not specifically evaluated in some continual learning context. I would also try to craft some baselines based on these works to compare against in your specific experiments
@MassimoCaccia: feedback greatly appreciated! Gonna post a summary of my thoughts I sen you privately, so that people can follow if they want.
So, i’m sure that there is a gazillion methods that could work well in our setting because of e.g. OoD generalization mechanism. The goal of our paper is, however, simply to change the evaluation protocol such that better ideas (maybe like you mentioned) could prevail. Now, I don’t think I can adress all the papers that could work in our setting, right? Also, I’m not really proposing a new methodology and claiming SOTA. So it’s somewhat fine, no? However, if some of theses papers actually proposed a changed in the evaluation protocol (maybe like the one in open world scenario), not that I should address!
I agree that I should add more reference to Online Learning and OoD and bring the gaps a little better (the former was on my Todo list).
Thanks a lot Martin
@Dave_Rawlinson: “it’s fine to foget, as long as you can quickly remember (just like we humans do)” This is a very interesting angle I haven’t read elsewhere
@MassimoCaccia: I think that’s what you get when you mix Online Learning and Continual Learning. This part is highly inspired from https://arxiv.org/abs/1906.05201.
arXiv.org: Task Agnostic Continual Learning via Meta Learning
@vlomonaco: @MassimoCaccia congrats for the great paper, it looks very interesting! 
A couple of questions:
- How do you compute the accuracy? Is it a running average, one for each “task”?
- What are roughly the training times and memory overheads of your strategy on the Tiered-ImageNet strategy?
- Why did you not compare your strategy with other SOTA CL strategies?
@MassimoCaccia: Hey Vincenzo, thanks for reading!
- it’s cumulative accuracy, or online or average. As you want to call it. The average is over all predictions
- I’d have to ask, but they we’re surprisingly not that long on a Single GPU
- BGD seemed like a good Task-Agnostic CL baseline. Then MetaBGD is better in our setting, because it can adapt more quickly. I’m not sure how to standard CL methods would apply in the task-agnostic setting (so class incremental algorithms don’t really work out of the box) any suggestion?
@vlomonaco: great! 
All the methods that we list in these two papers below, can be used in a task-agnostic setting!
https://arxiv.org/abs/1907.03799
https://arxiv.org/pdf/1806.08568.pdf
and also the one @Chris_Kanan has just posted:
https://arxiv.org/abs/1910.02509
arXiv.org: Fine-Grained Continual Learning
arXiv.org: REMIND Your Neural Network to Prevent Catastrophic Forgetting
@MassimoCaccia: So, IMO task-agnostic means that you can’t trivially infer the task from Y. So in classification, e.g., the class “dog” can could be represented by the first output dimension in a task, and the second in another (just like in meta-learning). Can these method handle this. IMO, incremental class learning, by construction, is task aware. But, I could be convinced otherwise…
@vlomonaco: Most class-incremental learning techniques don’t have the notion of tasks at all, so you can’t infer the “task” from Y. Every X encountered at every moment can be classified as any Y encountered so far. We explain this well in Sec 1.2 of this paper: https://arxiv.org/pdf/1806.08568.pdf
The only limitation of the techniques referred above is that you cannot have different inferred classes Y for the same sample X. That’s because there’s no task notion and you need an additional T signal to disambiguate.
I believe this is happening in your settings only for the “Synbol” benchmark where you use different outputs for the same input data for different tasks, right?
@MassimoCaccia: Yes, that was one goal of the Synbols dataset. i’ll check Sec 1.2 and get back to you