Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Is it just me or have they provided graphs of LNinput againt LNoutput when the tanh(a*x) is also followed by a weight and bias.

Surely you would want to compare the output of the LayerNorm without the weight and bias to get an impression on their similarity.

I guess it doesn't matter if the final result works, but I feel like looking at the bit that they are changing in isolation might provide a better insight as to what is happening.



From their implementation it looks like they’re calculating tanh and then applying a weight and bias


Exactly, And that's what happens in LayerNorm too. So if figured the best base for comparison would have been to leave that bit out when looking at their difference or similarity, because obviously the bits that have the same implementation will be the same.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: