> What most people fail to realise is that in between each token being generated...

> What most people fail to realise is that in between each token being generated, black magic is happening in between the transformer layers.

Thank you by saying that. I think most people have an incomplete mental model for how LLMs work. And it's very misleading for understanding what they really do and can achieve. "Next token prediction" is done only at the output layer. It's not what really happens internally. The secret sauce is at the hidden layers of a very deep neural network. There are no words or tokens inside the network. A transformer is not the simple token estimator that most people imagine.