Hi, I am using Avalanche to generate some benchmarks for evaluation. I would like to ask what is the correct way to evaluate continual learning strategies? In the Avalanche example,
strategy.eval() is called after every experience have completed training, but I think it is too late to track the rate of forgetting by then (for a SplitMNIST of 5 experiences we can only plot 5 evaluation data points).
Within strategy.train(), we seem to be able to pass in
eval_every=1 to evaluate on every epoch, but we don’t seem to be passing in the
benchmark.test_stream. Is it evaluated on a train/test split of the
For context, I am training a
LeNet5 model using the
EWC strategy. What is perplexing is that the forgetting still seems catastrophic, even though
EWC is supposed to be a seminal paper. My results can be seen in this csv output:
eval_exp,training_exp,eval_accuracy,eval_loss,forgetting 0,0,0.9872,0.0360,0 1,0,0.0000,7.9937,0 2,0,0.0000,7.7202,0 3,0,0.0000,7.9421,0 4,0,0.0000,7.9862,0 0,1,0.0171,4.8538,0.9701 1,1,0.9864,0.0438,0 2,1,0.0000,9.3661,0 3,1,0.0000,8.1056,0 4,1,0.0000,8.0996,0 0,2,0.0000,7.0644,0.9872 1,2,0.0000,8.1437,0.9864 2,2,0.9975,0.0077,0 3,2,0.0000,7.0645,0 4,2,0.0000,6.7429,0 0,3,0.0000,7.6378,0.9872 1,3,0.0000,9.0269,0.9864 2,3,0.0000,5.3852,0.9975 3,3,0.9880,0.0295,0 4,3,0.0000,7.1526,0 0,4,0.0000,6.9717,0.9872 1,4,0.0005,6.8190,0.9859 2,4,0.0000,6.4735,0.9975 3,4,0.0000,5.3892,0.9880 4,4,0.9990,0.0043,0
As such, I want to trace the rate of forgetting for each minibatch iteration to see if it is because I am training it on too many iterations (causing overfitting).