Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Indeed, fine tuning with either synthetic data (as you are proposing) or human review works like that. you can read more here: https://huggingface.co/blog/rlhf


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: