You're right about "reasoning". It's just trying to steer the conversation in a ...

Terr_ · 2025-11-07T01:04:06 1762477446

> You're right about "reasoning". It's just trying to steer the conversation in a more relevant direction in vector space, hopefully to generate more relevant output tokens.

I like to frame it as a theater-script cycling through the LLM. The "reasoning" difference is just changing the style so that each character has film noir monologues. The underlying process hasn't really changes, and the monologues text isn't fundamentally different from dialogue or stage-direction... but more data still means more guidance for each improv-cycle.

> say we're generating the next token from the word "queen". Is this the monarch, the bee, the playing card, the drag entertainer?

I'd like to point out that this scheme can result in things that look better to humans in the end... even when the "clarifying" choice is entirely arbitrary and irrational.

In other words, we should be alert to the difference between "explaining what you were thinking" versus "picking a firm direction so future improv makes nicer rationalizations."

esafak · 2025-11-07T02:11:26 1762481486

It makes sense if you think of the LLM as building a data-aware model that compresses the noisy data by parsimony (the principle that the simplest explanation that fits is best). Typical text compression algorithms are not data-aware and not robust to noise.

In lossy compression the compression itself is the goal. In prediction, compression is the road that leads to parsimonious models.