Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Semi-related, as long as we're requesting things: to @pr337h4m's point above, it would be interesting to have some rough guidance (even a sidebar or single paragraph) on when it makes sense to pre-train a new foundation model vs finetune vs pass in extra context (RAG). Clients of all sizes—from Fortune 100 to small businesses—are asking us this question.


That's a good point. I may briefly mention RAG-like systems and add some literature references on this, but I am bit hesitant to give general advice because it's heavily project-dependent in my opinion. It usually also comes down in what form the client has the data and whether referencing from a database or documentation is desired or not. The focus of chapter 6+7 is also instruction-finetuning and alignment rather than finetuning for knowledge. The latter goal is best achieved done via pretraining (as opposed to finetuning) imho. In any case, I just read this interesting case study last week on Finetuning vs RAG that might come in handy: "RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture" (https://arxiv.org/abs/2401.08406)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: