Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Languages, sizes, and degrees of open-ness.


There's some more commentary on their open-ness in this blog too https://www.interconnects.ai/p/olmo


That post also very helpfully links to another paper they published alongside the OLMo paper just on the dataset.

Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

https://arxiv.org/abs/2402.00159




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: