How you update weights in a neural network may have a profound effect.
Multiplicative weight updates induce sparsity and modularity.
There is exponentiation of the weights, the weights can be increased by say (1+alpha) or decreased by (1-alpha.) Those weights that have been increased a number of times can have vastly higher magnitude that those decremented. And that is what leads to sparsity.
Can you use similar ideas for continual learning? Not using exponentiation, maybe say the logarithm or square root. Where it would take a massive number of increments to get a large weight magnitude. Which would make it very difficult to reverse the process and make the weight magnitude small again.