Online Fast Adaptation and Knowledge Accumulation: a New Approach to Continual Learning

Hey everyone!

Looking for feedback for our new paper (that is a little different from standard papers proposing a new model, hence the needed feedback):

Online Fast Adaptation and Knowledge Accumulation: a New Approach to Continual Learning

Here is a tiny summary:

We have made a lot of progress on catastrophic forgetting within the standard evaluation protocol,i.e. sequentially learning a stream of tasks and testing our models’ capacity to remember them all.
We think it’s time a new approach to CL, which is more aligned with real-life applications of CL.

Here are the main modifications we propose:

  • bring CL closer to Online learning i.e. at test time, the model is continually learning and evaluated on its online predictions
  • it’s fine to forget, as long as you can quickly remember (just like we humans do)
  • we allow pretraining, (because you wouldn’t deploy an untrained CL system, right?) but at test time, the model will have to quickly learn new out-of-distribution tasks (because the world is full of surprise)
  • the tasks distribution is actually a hidden Markov chain. This implies:
    • new and old tasks can re-occur (just like in real life). Better remember them quickly if you want to get a good total performance!
    • tasks have different lengths
    • and the tasks boundaries are unknown (task agnostic setting)

We provide a unifying framework to explain the space of machine learning setting {supervised learning, meta learning, continual learning, meta-continual learning, continual-meta learning} in case it was starting to get confusing :stuck_out_tongue:

The setting is not better than the others… We just think it should be studied as well :slight_smile:

1 Like

I’ll post here some of the answers to the thread on slack! :slight_smile:

1 Like

Ok, so I looked at section 1.2 of your paper In my vocabulary, this is not a task-agnostic setting. I think we just have different solutions. For me, if you are doing incremental class learning, you are in the task-aware setting by construction.

I’ll try to reword my definition: task-agnostic for me is the scenario where the labels might not always map to the same output. It is more general. (I know it’s weird in classification… However, all of meta-learning is that way).

So don’t think I could apply the methods you are proposing in this setting.

1 Like

I understand what you mean now, thanks.

You can easily apply them once you infer the task you are solving using separated “heads”! :slight_smile:

The fact is that, once you infer the task, the problem becomes easier, since you can separate the output space. This is why we focus on what we call the “Single-Incremental-Task” setting, where you can’t separate it by definition and it has been proved to be harder than the Multi-Task setting.

Then of course you can make it more complicated, adding Multiple-Incremental-Tasks (different mappings f(x) for the same x you learn continually), but we believe that’s not at the center of the CL problem.

In the end, what we want to solve in CL is pretty simple. Learn:

f: X -> Y

Where X and Y can change over time. When you add the notion of task, you’re just adding complexity but the core of the issue is the same:

f: X, T -> Y

Where X, T and Y can change over time.

This is just my opinion though! :smiley:

1 Like

I agree that my setting is “easier” because you can reduce the output space if you correctly infer the task from the context.

I don’t see why it’s not central to CL though. In the end, I think that a lot of applications outside image classification can benefit from modeling the context. E.g. time series forecasting, autonomous driving, robotics, recommender systems, partially observable games (i’d be happy to dig in if it’s not super clear why).

Even for image classification, I could see how you would leverage context. E.g. someone using an Animal Classifier app in the jungle: if the app is confident that it is in the jungle, it could switch to using its “jungle” output head, for better accuracy.

So, IMO, we should study both settings, without one being more central than the other. Just depends on the set of applications you want to make progress on :slight_smile:

Hope we can continue the discussion. I think we are making progress

1 Like

Absolutely! Every setting has the right to exist and to be studied, especially when targeting real-world application! Moreover, a good strategy should be able to work in any setting :slight_smile:

What I meant is that the core objective of CL today is to learn robust, high-level features from a stream of data drawn from ever-changing data distributions and this can be studied independently by the presence of multiple tasks.

So, in general, I prefer to focus on settings with just one task to remove the “noise” introduced by different settings that can artificially / indirectly help to solve the scenario without providing roboust and scalable solutions to learn continually, but this is just my approach! :smiley: