SupSup - Supermasks in Superposition

There are so many excess parameters in a conventional artificial neural network you can do things like this:
Personally I would say you only need a number of parameters per layer, linear in the width n of the network (c.n). Where c=1 or 2. Not the usual n.n. But for a hobbyist to convince anyone of that is very difficult!

1 Like

I invited the main author to talk at one on our RG about this paper! :slight_smile: