Sometimes training from scratch is able to match the results of pre-training, given ~5X more time to converge. Other times, though, it never does as well as a pre-trained model, converging to a worse final result.
This isn't too surprising -- the whole point of the method is to be able to learn from experience.
Sometimes training from scratch is able to match the results of pre-training, given ~5X more time to converge. Other times, though, it never does as well as a pre-trained model, converging to a worse final result.
This isn't too surprising -- the whole point of the method is to be able to learn from experience.