Is it just me or have they provided graphs of LNinput againt LNoutput when the t...

lukah · on March 15, 2025

From their implementation it looks like they’re calculating tanh and then applying a weight and bias

Lerc · on March 15, 2025

Exactly, And that's what happens in LayerNorm too. So if figured the best base for comparison would have been to leave that bit out when looking at their difference or similarity, because obviously the bits that have the same implementation will be the same.