I had to read the paper first, but yeah, that diagram is shockingly simple once ...

I had to read the paper first, but yeah, that diagram is shockingly simple once you get it.

Some annotations:

- The labels in the orange boxes mean "A is initialized with random weights (in a gaussian distribution, B is initialized with weights set to zero".

- d is the number of values of the layer's input and output. (The width of the input and output vectors, if you will.)

- r is the number of "intermediary values" between A and B. It's expected to be a lot smaller than d, hence "Low Rank" (apparently LoRa even works with r = 3 or so), but it can be equal to d, though you lose some of the perf benefits.