Doubts in implementation of "Synaptic Intelligence" paper

Dear all,
I recently joined this community after searching on github while I was trying to reproduce the results of “Continual Learning Through Synaptic Intelligence”. I tried implementing the algorithm as best as I could understand after going through paper many times. I also looked at it’s official implementation on github which is in tensorflow 1.0, but could not understand much as I don’t have much familiarity with that.
Though I got some results but not good enough as paper. I wanted to ask if anyone can help me to find out where I am going wrong. Before going into coding details I want to discuss sudo code so that I undersatnd what is going wrong with my implementation.

Here is kind of sudo code that I have implemented. Please help me.

lambda = 1
xi = 1e-3

total_tasks = 5

model = NN(total_tasks)
## multiheaded linear model ([784(input)-->256-->256-->2(output)(*5, 5 separate heads)])
## output layer is 2 neuron head (separate heads for each task, total 5 tasks)
## output is vector of size 2 (for 2 classes)

prev_theta = model.theta(copy=True) # updated at end of task
## model.theta() returns list of shared parameters (i.e. layer1 and layer2 excluding output layer)
## copy=True, gives copy of parameters
## so it don't effect original params connected to computaitonal graph

omega_total = zero_like(prev_theta) ## Capital Omega in paper (per-parameter regularization strength)
omega = zero_like(prev_theta) ## small omega in paper (per-parameter contribution to loss)

for task_num in range(total_tasks):
	optmizer = ADAM() # created before every task (or reset it)
	prev_theta_step = model.theta(copy=True) # updated at end of step
	## trainig for task start
	for epoch in range(10):
		for steps in range(steps_per_epoch):
			X, Y = train_dataset[task_num].sample()
			## X is flattened image of size 784
			## Y is binary vector of size 2 ([0,1] or [1,0])

			Y_pred = model(X, task_num) # model is multihead, task_num selects the head
			loss = CROSS_ENTROPY(Y_pred, Y)

			if(task_num>0): ## reg_loss starts from second task
				theta = model.theta()
				## here copy is not true so it returns params connected to computaitonal graph
				reg_loss = torch.sum(omega_total*torch.square(theta - prev_theta))

				loss = loss + lambda*reg_loss


			theta = model.theta(copy=True)
			grads = model.theta_grads() ## grads of shared paramters only
			omega = omega - grads*(theta - prev_theta_step)
			prev_theta_step = theta


	## training for task complete, update importance parameters
	theta = model.theta(copy=True)
	omega_total += relu( omega/( (theta - prev_theta)**2 + xi) )
	prev_theta = theta
	omega = torch.zeros(theta_shape)

	## evaluation code
	## evaluation done

I am also attaching result I got.

EDIT1: In results ‘one’ (blue) represents without regression loss (lambda=0), ‘two’ (green) represents with regression loss (lambda=1).

Thank you for reading so far. Kindly help me out.

Hi, we have a Synaptic Intelligence implementation made in Avalanche. You can check it out, it is based on Pytorch. You can also check out our reproducibility repository, where we reproduce results from continual learning papers, including Synaptic Intelligence.

1 Like