Training and aligning LLMs with RLHF and RLHF alternatives

scoresmoke · on Sept 10, 2023

Discussions about LLM alignment often miss topics of data quality and quantity. It turns out that current models like Llama 2 use 10K+ prompts and responses for supervised fine-tuning (SFT) and 100K+ human preference pairs. While the preferences are pretty easy to annotate, producing a good SFT dataset is uneasy.

https://evalovernite.substack.com/p/rlhf-math-aint-enough

https://doi.org/10.5281/zenodo.8186168

jamesblonde · on Sept 10, 2023

I read here that Yann LeCun claimed that even with RLHF, LLMs will still hallucinate - that it's an unavoidable consequence of their autoregressive nature

https://www.hopsworks.ai/dictionary/rlhf-reinforcement-learn...

famouswaffles · on Sept 10, 2023

Likely yes. But "solving" hallucinations is not really important as long as mitigating it to some sufficiently low level is possible.

phillipcarter · on Sept 10, 2023

Moreover, it's all about use case. If you need a high degree of reliability and reproducibility, don't use LLMs! Not yet, at least. That's fine though, because there's a ton of value they offer in solving problems where that isn't needed.

bugglebeetle · on Sept 10, 2023

> If you need a high degree of reliability and reproducibility, don't use LLMs!

This is true of pretty much all of machine learning. LLMs are just getting singled out because their outputs are not getting the same level of validation that typicall occurs with older approaches. BERT models will also spit out whacky stuff, depending on how they’re trained/fine-tuned/used/etc

3abiton · on Sept 10, 2023

I wonder if there will be a new metric implemented in evaluating LLMs: Hallucination score.

westurner · on Sept 11, 2023

When the next token is a URL, and the URL does not match the preceding anchor text.

Additional layers of these 'LLMs' could read the responses and determine whether their premises are valid and their logic is sound as necessary to support the presented conclusion(s), and then just suggest a different citation URL for the preceding text.

"#StructuredPremises"

bugglebeetle · on Sept 10, 2023

For many NLP tasks (which is what I mostly use LLMs for), hallucinations can be prevented with simple, procedural checks against the input or a controlled vocabulary. For example, for NER tasks, you can just check whether the extracted entities are valid relative to either of the two.

ShamelessC · on Sept 10, 2023

That goes without saying.

edit: I don't like your linked article at all. Subtly misleading and/or misinformed. Like a yahoo news but for ML.

to clarify: No one (certainly not OpenAI) suggested that RLHF was useful for reducing hallucinations. It's not for that. The insinuation that it was designed for that purpose (at least partially) and yet "failed" is a faulty one. It was not designed for that purpose. Hallucinations are a known issue with large language models, and while I appreciate LeCunn re-iterating that; lesser researchers than LeCunn are aware of that fact.

jprete · on Sept 11, 2023

What does hallucination have to do with alignment?

ftxbro · on Sept 11, 2023

what people in the media are calling 'hallucination' in large language models is really imagination and creativity. it's inherent in the models and it's why they have been able to unlock so many amazing cognitive capabilities where previous efforts have failed

Salgat · on Sept 11, 2023

That's like saying that a simple linear regression predicting the wrong (x,y) value is just "being creative". The LLM is lacking in sufficient complexity to make the correct word prediction. It's not being creative, it is just flat out wrong.

ftxbro · on Sept 11, 2023

Sure your linear regression example is great, and it's an example of how 'hallucination' in models can be good. Imagine if the only prediction algorithms had to fit every point in noisy data, it would look like either some overfitted thing like an ugly high order polynomial that goes through every point or just a database of every x y pair in the training set. Neither of those ones hallucinate if you ask the data from the training set. But for most purposes they wouldn't be as useful as a fitted linear regression, which 'hallucinates' y values even if you give it exact x from the training set. Of course if you just want to memorize and repeat back the training set then just a database is the best way!

Salgat · on Sept 11, 2023

So by your definition of creativity, the LLM is always being creative because it's always using probability to determine the next word (it's not just spitting out facts from a database). Hallucination is just when that prediction, or creativity as you call it, is wrong.