Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

They do two things - RLHF to make the model itself better aligned to human preferences, and they use an external model, a small one, called text-moderation-001, that tests for a few problematic categories and triggers a warning message on the screen.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: