Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

But wait, if the problem is the final tokenisation, what would happen if we stopped it one or two layers before the final layer? I get that the result would not be as readable to a human as the final layer, but would it not be as confused with its own output anymore?

Or would it still be a problem because we're collapsing a distribution of likely responses down to a single response, and it's not happy with that single response even if it is fuzzier than what comes out of the last layer?



It's not so clear how one could use the output of an embedding layer recursively, so it is a bit ill-defined to know what you mean by "stopped it" and "confused with its own output" here. You are mixing metaphor and math, so your question ends up being unclear.

Yes, the outputs from a layer one or two layers before the final layer would be a continuous embedding of sorts, and not as lossy (compared to the discretized tokenization) at representing the meaning of the input sequence. But you can't "stop" here in a recursive LLM in any practical sense.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: