Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> DPO is pretty much strictly better than RLHF + PPO

Out of genuine curiosity, do you have any pointers/evidence to support this. I know that some of the industry leading research labs haven't switched over to DPO yet, in spite of the fact that DPO is significantly faster than RLHF. It might just be organizational inertia, but I do not know. I would be very happy if simpler alternatives like DPO were as good as RLHF or better, but I haven't seen that proof yet.



I can second that. From what I’ve heard from people at leading labs, it’s not clear that dpo is worth switching to from RLHF




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: