> Pre-training is crucial for this mechanism: models trained from scratch to add... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		DoctorOetker on Feb 6, 2025 \| parent \| context \| favorite \| on: Pre-Trained Large Language Models Use Fourier Feat... > Pre-training is crucial for this mechanism: models trained from scratch to add numbers only exploit low-frequency features, leading to lower accuracy. what's the convention on the meaning of "pre-training" vs "training from scratch" ? Is this a nomenclature shift?

currymj on Feb 6, 2025 [–]

pre-trained model would mean training a language model to predict text, then starting from there and training it to add numbers.

training from scratch would be initializing a neural network, and training it to add numbers directly.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact