Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

While the fine-tuning pipeline is fairly straightforward for tuning and building custom models, the RLHF pipeline doesn't look to be as straightforward. Creating a dataset for RLHF seems like a fairly labour intensive exercise especially if your model is tuned to do work like code generation ?

What about the Replit Ghostwriter? Did it have a RLHF phase?



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: