While the fine-tuning pipeline is fairly straightforward for tuning and building custom models, the RLHF pipeline doesn't look to be as straightforward. Creating a dataset for RLHF seems like a fairly labour intensive exercise especially if your model is tuned to do work like code generation ?
What about the Replit Ghostwriter? Did it have a RLHF phase?
What about the Replit Ghostwriter? Did it have a RLHF phase?