But wait, if the problem is the final tokenisation, what would happen if we stop...

D-Machine · 2025-10-06T08:10:12 1759738212

It's not so clear how one could use the output of an embedding layer recursively, so it is a bit ill-defined to know what you mean by "stopped it" and "confused with its own output" here. You are mixing metaphor and math, so your question ends up being unclear.

Yes, the outputs from a layer one or two layers before the final layer would be a continuous embedding of sorts, and not as lossy (compared to the discretized tokenization) at representing the meaning of the input sequence. But you can't "stop" here in a recursive LLM in any practical sense.