Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Pre-training is crucial for this mechanism: models trained from scratch to add numbers only exploit low-frequency features, leading to lower accuracy.

what's the convention on the meaning of "pre-training" vs "training from scratch" ?

Is this a nomenclature shift?



pre-trained model would mean training a language model to predict text, then starting from there and training it to add numbers.

training from scratch would be initializing a neural network, and training it to add numbers directly.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: