It's an old problem, and it, along with many of the answers are in many recent d... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		lxe 18 days ago \| parent \| context \| favorite \| on: Kimi K2 Thinking, a SOTA open-source trillion-para... It's an old problem, and it, along with many of the answers are in many recent data sets.

riku_iki 17 days ago [–]

I assume training set components have also priorities, low priority data goes to training very few times at the beginning of pretraining, while higher priority data is trained on multiple times until the end.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact