Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes, but what if they are very large documents that exceed the maximum context size, say, a 200-page PDF? In that case won't you be forced to do some form of fine-tuning, in order to avoid a very slow/computationally expensive on-the-fly retrieval?

Edit: spelling



Typical retrieval methods break up documents into chunks and perform semantic search on relevant chunks to answer the question.


Fine-tuning the LLM in the way that you're mentioning is not even an option: as a practical rule fine-tuning the LLM will let you do style transfer, but you knowledge recall won't improve (there are edge cases, but none apply to using ChatGPT)

That being said you can use fine tuning to improve retrieval, which indirectly improves recall. You can do things like fine tune the model you're getting embeddings from, fine tune the LLM to craft queries that better match a domain specific format, etc.

It won't replace the expensive on-the-fly retrieval but it will let you be more accurate in your replies.

Also retrieval can be infinitely faster than inference depending on the domain. In well defined domains you can run old school full text search and leverage the LLMs skill at crafting well thought out queries. In that case that runs at the speed of your I/O.


We have >200 page PDFs at https://docalysis.com/ and there's on-the-fly retrieval. It's not more computationally expensive than something like searching one's inbox (I'd image you have more than 200 pages worth of emails in your inbox).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: