To put it in other words: it is not clear when and how they hallucinate. With a ...

liotier · 2024-12-27T15:04:45 1735311885

In a conversation (conversation and attached pictures at https://bsky.app/profile/liotier.bsky.social/post/3ldxvutf76...), I delete a spurious "de" ("Produce de two-dimensional chart [..]" to "Produce two-dimensional [..]") and ChatGPT generates a new version of the graph, illustrating a different function although nothing else has changed and there was a whole conversation to suggest that ChatGPT held a firm model of the problem. Confirmed my current doctrine: use LLM to give me concepts from a huge messy corpus, then check those against sources from said corpus.

zozbot234 · 2024-12-27T15:03:56 1735311836

LLM's are non-deterministic: they'll happily give different answers to the same prompt based on nothing at all. This is actually great if you want to use them for "creative" content generation tasks, which is IMHO what they're best at. (Along with processing of natural language input.)

Expecting them to do non-trivial amounts of technical or mathematical reasoning, or even something as simple as code generation (other than "translate these complex natural-language requirements into a first sketch of viable computer code") is a total dead end; these will always be language systems first and foremost.

mapt · 2024-12-27T15:31:18 1735313478

This confuses me. You have your model, you have your tokens.

If the tokens are bit-for-bit-identical, where does the non-determinism come in?

If the tokens are only roughly-the-same-thing-to-a-human, sure I guess, but convergence on roughly the same output for roughly the same input should be inherently a goal of LLM development.

zeta0134 · 2024-12-27T15:36:56 1735313816

Most any LLM has a "temperature" setting, a set of randomness added to the otherwise fixed weights to intentionally cause exactly this nondeterministic behavior. Good for creative tasks, bad for repeatability. If you're running one of the open models, set the temperature down to 0 and it suddenly becomes perfectly consistent.

owenpalmer · 2024-12-27T16:21:31 1735316491

You can get deterministic output with even with a high temp.

Whatever "random" seed was used can be reused.

zozbot234 · 2024-12-27T15:36:12 1735313772

The model outputs probabilities, which you have to sample randomly. Choosing the "highest" probability every time leads to poor results in practice, such as the model tending to repeat itself. It's a sort of Monte-Carlo approach.

HarHarVeryFunny · 2024-12-27T15:39:31 1735313971

The trained model is just a bunch of statistics. To use those statistics to generate text you need to "sample" from the model. If you always sampled by taking the model's #1 token prediction that would be deterministic, but more commonly a random top-K or top-p token selection is made, which is where the randomness comes in.

lifthrasiir · 2024-12-27T15:36:35 1735313795

It is technically possible to make it fully deterministic if you have a complete control over the model, quantization and sampling processes. The GP probably meant to say that most commercially available LLM services don't usually give such control.

brookst · 2024-12-27T16:14:41 1735316081

Actually you just have to set temperature to zero.

ninkendo · 2024-12-27T15:38:45 1735313925

> If the tokens are bit-for-bit-identical, where does the non-determinism come in?

By design, most LLM’s have a randomization factor to their model. Some use the concept of “temperature” which makes them randomly choose the 2nd or 3rd highest ranked next token, the higher the temperature the more often/lower they pick a non-best next token. OpenAI described this in their papers around the GPT-2 timeframe IIRC.

ninetyninenine · 2024-12-27T16:25:59 1735316759

Computers are deterministic. LLMs run on computers. If you use the same seed for the random number generator you’ll see that it will produce the same output given an input.

layer8 · 2024-12-27T16:31:25 1735317085

The unreliability of LLMs is mostly unrelated to their (artificially injected) non-determinism.

ANewFormation · 2024-12-27T15:13:24 1735312404

There's no need for there to be changes to the question. LLMs have a rng factor built in to the algorithm. It can happily give you the right answer and then the wrong one.

aruametello · 2024-12-27T15:05:07 1735311907

> trivial changes in the question

i love how those changes are often just a different seed in the randomness... as just chance.

run some repeated tests with "deeper than surface knowledge" on some niche subjects and got impressed that it gave the right answer... about 20% of the time.

(on earlier openAI models)

brookst · 2024-12-27T16:15:49 1735316149

Ask survey designers how “trivial” changes to questions impact results from humans. It’s a huge thing in the field.