Sure, but why would one prefer tanh instead of normalization layers if they have...

		gdiamos on March 16, 2025 \| parent \| context \| favorite \| on: Transformers Without Normalization Sure, but why would one prefer tanh instead of normalization layers if they have the same accuracy? I suppose normalization kernels have reductions in them, but how hard are reductions in 2025?