Fine tuning by pretraining over a RL tuned model is dumb AF. RL task tuning work...

HarHarVeryFunny · 2025-10-19T12:00:20 1760875220

You may have no choice in how the model you are fine tuning was trained, and may have no interest in verticals it was RL tuned for.

In any case, platforms like tinker.ai support both SFT and RL.

CuriouslyC · 2025-10-19T12:31:06 1760877066

Why would you choose a model where the trained in priors don't match your use case? Also, keep in mind that RL'd in behavior includes things like reasoning and how to answer questions correctly, so you're literally taking smart models and making them dumber by doing SFT. To top it off, SFT only produces really good results when you have traces that closely model the actual behavior you're trying to get the model to display. If you're just trying to fine tune in a knowledge base, a well tuned RAG setup + better prompts win every time.

imcritic · 2025-10-19T12:36:53 1760877413

Because you need a solution for your problem and the available tools are what they are and nothing else and you don't have enough resources to train your own model.