Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Creating better datasets would also help to improve the performance of the models, I would assume. Unfortunately, the costs to produce high-quality datasets of a sufficient size seem prohibitive today.

I'm hopeful this will be possible in the future though, maybe using a mix of 1) using existing LLMs to help humans filter the existing internet-scale datasets, and/or 2) finding some new breakthroughs to make model training more data efficient.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: