ReLU as a switch, Associative memory etc

ReLU is a switch:
On=identity function (f(x)=x)=connect.
A ReLU neural network then is a switched composition of dot products (weighted sums.)
The dot product of a number of dot products is still a dot product.

For a particular input vector the state of each switch in the network is known.
The input to each ReLU is composition of dot products that can be condensed down to a single dot product with the input vector. I think that is interesting.

Anyway you can use that understanding of how ReLU neural networks work to make neural networks based on fast transforms like the FFT or Walsh Hadamard transform.

Associative memory based on hashing:
Hashing products random vectors that in higher dimensional space are approximately orthogonal. The weighted sum can easily be used to associate scalar values with a fixed number of those random vectors.
Using milder forms of hashing such as Locality Sensitive Hash (LHS) or reservoirs (as in reservoir computing) allows you to create a more general form of associative memory. And additional mathematics becomes relevant, the variance equation for linear combinations of random variables and the central limit theorem.

1 Like

Thanks for sharing your thoughts on this, it is indeed very interesting! :slight_smile:

I just wrote an ALife combining Fixed Filter Bank (fast transform) neural nets with associative memory. Which makes a Neural Turing Machine.
The example problems are not so great and not so well coded. The neural network and associative memory code are okay though. I did test them.
I’ll try to find other suitable problems for small Neural Turing Machines.

I found this paper on Butterfly Neural Networks:
They use the FFT (WHT) butterfly operator to reduce the computational cost of neural networks.
Conventional neural networks need n*n multiply-adds per layer where n is the width of the network. That is very time consuming and may be an inefficient use of weight parameters.
They have code here:

ReLU is a switch. Switching dot products into compositions, which if all the switch states are known can be condensed back to a single dot product.
The numeric output of a neuron then is I dot C where I is the input vector and C is the condensed vector.
The exact components of the C vector are not particularly relevant. Only the magnitude of C and the angle between I and C govern the numeric output of the given neuron, the magnitude of I usually being fixed.
The magnitude of C is a summary measure where the exact individual components of C only have a limited influence.
Likewise the angle between I and C is a summary measure.
That means there are numerous choices for the components of C that will produce a wanted output value.
Even in the case of a highly restricted choice of dot products it should be possible to find a reasonable composition to produce a particular approximate dot product output. This explains how Fast Transform (aka. Fixed Filter Bank) Neural Networks can still fit the data very well even with only a few layers and limited variability of the dot products involved. You are not relying on specific components of C being specific values, just the summary measures being specific values, allowing a lot of flexibility.

The same idea also applies to pruned conventional neural networks, Lottery Ticket networks etc.

1 Like