Dear all,

I recently joined this community after searching on github while I was trying to reproduce the results of “Continual Learning Through Synaptic Intelligence”. I tried implementing the algorithm as best as I could understand after going through paper many times. I also looked at it’s official implementation on github which is in tensorflow 1.0, but could not understand much as I don’t have much familiarity with that.

Though I got some results but not good enough as paper. I wanted to ask if anyone can help me to find out where I am going wrong. Before going into coding details I want to discuss sudo code so that I undersatnd what is going wrong with my implementation.

Here is kind of sudo code that I have implemented. Please help me.

```
lambda = 1
xi = 1e-3
total_tasks = 5
model = NN(total_tasks)
## multiheaded linear model ([784(input)-->256-->256-->2(output)(*5, 5 separate heads)])
## output layer is 2 neuron head (separate heads for each task, total 5 tasks)
## output is vector of size 2 (for 2 classes)
prev_theta = model.theta(copy=True) # updated at end of task
## model.theta() returns list of shared parameters (i.e. layer1 and layer2 excluding output layer)
## copy=True, gives copy of parameters
## so it don't effect original params connected to computaitonal graph
omega_total = zero_like(prev_theta) ## Capital Omega in paper (per-parameter regularization strength)
omega = zero_like(prev_theta) ## small omega in paper (per-parameter contribution to loss)
for task_num in range(total_tasks):
optmizer = ADAM() # created before every task (or reset it)
prev_theta_step = model.theta(copy=True) # updated at end of step
## trainig for task start
for epoch in range(10):
for steps in range(steps_per_epoch):
X, Y = train_dataset[task_num].sample()
## X is flattened image of size 784
## Y is binary vector of size 2 ([0,1] or [1,0])
Y_pred = model(X, task_num) # model is multihead, task_num selects the head
loss = CROSS_ENTROPY(Y_pred, Y)
if(task_num>0): ## reg_loss starts from second task
theta = model.theta()
## here copy is not true so it returns params connected to computaitonal graph
reg_loss = torch.sum(omega_total*torch.square(theta - prev_theta))
loss = loss + lambda*reg_loss
optmizer.zero_grad()
loss.backward()
theta = model.theta(copy=True)
grads = model.theta_grads() ## grads of shared paramters only
omega = omega - grads*(theta - prev_theta_step)
prev_theta_step = theta
optimizer.step()
## training for task complete, update importance parameters
theta = model.theta(copy=True)
omega_total += relu( omega/( (theta - prev_theta)**2 + xi) )
prev_theta = theta
omega = torch.zeros(theta_shape)
## evaluation code
...
...
...
## evaluation done
```

I am also attaching result I got.

EDIT1: In results ‘one’ (blue) represents without regression loss (lambda=0), ‘two’ (green) represents with regression loss (lambda=1).

Thank you for reading so far. Kindly help me out.