Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Can it be that transformer-based solutions come from the well-funded organizations that can spend vast amount of money on training expensive (O(n^3)) models?

Are there any papers that compare predictive power against compute needed?



You're onto something. BabyLM competition had caps. Many LLM's were using 1TB training data for some time.

In many cases, I can't even see how many GPU hours or what size cluster of what GPU's the pretraining required. If I can't afford it, then it doesn't matter what it achieved. What I can afford is what I have to choose from.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: