Per task hyper-parameter tuning

Hi all, I was wondering about the topic of tuning hyperparameters per task in continual learning? are there papers out there that specifically target this?

Not many, you can take a look at the A-GEM paper. In general, doing model selection is always a stretch in CL, but quite an accepted one for now. Most of the model selections are performed on the same stream in which you perform evaluation, which would be considered very wrong in iid Machine Learning.