They do two things - RLHF to make the model itself better aligned to human prefe...

		visarga on Feb 2, 2023 \| parent \| context \| favorite \| on: The unequal treatment of demographic groups by Cha... They do two things - RLHF to make the model itself better aligned to human preferences, and they use an external model, a small one, called text-moderation-001, that tests for a few problematic categories and triggers a warning message on the screen.