I had to read the paper first, but yeah, that diagram is shockingly simple once you get it.
Some annotations:
- The labels in the orange boxes mean "A is initialized with random weights (in a gaussian distribution, B is initialized with weights set to zero".
- d is the number of values of the layer's input and output. (The width of the input and output vectors, if you will.)
- r is the number of "intermediary values" between A and B. It's expected to be a lot smaller than d, hence "Low Rank" (apparently LoRa even works with r = 3 or so), but it can be equal to d, though you lose some of the perf benefits.
Some annotations:
- The labels in the orange boxes mean "A is initialized with random weights (in a gaussian distribution, B is initialized with weights set to zero".
- d is the number of values of the layer's input and output. (The width of the input and output vectors, if you will.)
- r is the number of "intermediary values" between A and B. It's expected to be a lot smaller than d, hence "Low Rank" (apparently LoRa even works with r = 3 or so), but it can be equal to d, though you lose some of the perf benefits.