Moreover, it's all about use case. If you need a high degree of reliability and reproducibility, don't use LLMs! Not yet, at least. That's fine though, because there's a ton of value they offer in solving problems where that isn't needed.
> If you need a high degree of reliability and reproducibility, don't use LLMs!
This is true of pretty much all of machine learning. LLMs are just getting singled out because their outputs are not getting the same level of validation that typicall occurs with older approaches. BERT models will also spit out whacky stuff, depending on how they’re trained/fine-tuned/used/etc
When the next token is a URL, and the URL does not match the preceding anchor text.
Additional layers of these 'LLMs' could read the responses and determine whether their premises are valid and their logic is sound as necessary to support the presented conclusion(s), and then just suggest a different citation URL for the preceding text.
For many NLP tasks (which is what I mostly use LLMs for), hallucinations can be prevented with simple, procedural checks against the input or a controlled vocabulary. For example, for NER tasks, you can just check whether the extracted entities are valid relative to either of the two.