While pretraining a decent-sized LLM from scratch is not financially feasible fo...

While pretraining a decent-sized LLM from scratch is not financially feasible for the average person, it is very much feasible for the average YC/VC backed startup (ignoring the fact that it's almost always easier to just use something like Mixtral or LLaMa 2 and fine-tune as necessary).

>Introducing MPT-7B, the first entry in our MosaicML Foundation Series. MPT-7B is a transformer trained from scratch on 1T tokens of text and code. It is open source, available for commercial use, and matches the quality of LLaMA-7B. MPT-7B was trained on the MosaicML platform in 9.5 days with zero human intervention at a cost of ~$200k

https://www.databricks.com/blog/mpt-7b