In this post, we explore problems involved in LLM deployment, from GPU shortages to bottlenecks in model performance. These problems have inspired recent developments in distributed training frameworks commonly used to train LLMs, notably ZeRO-Offload. Here we give an overview of ZeRO-Offload, and in future posts we describe its benefits in depth.
RAG is great for pulling some additional knowledge, but if you combine it with fine-tuning (i.e., the LLM 'understands' the domain-specific terminology better) it becomes a lot more effective
Looks really promising. I wonder if the similar pricing to OpenAI means that Gradient is also(?) bleeding money even if they get a good customer base. Or are these prices sustainable over time?
Yeah it's even cheaper. Although it looks like it's about the same in proportion to approx model size/expected quality? They haven't launched any >13B model yet, although they plan to.
This guy used gradient.ai and he has a Google Collab to try it