Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Seems like a pretty easy fix. I would argue though that there's a larger body of "hateful" content out there targeting those specific groups, which is probably why its more keen to trigger on those vs others.

For example, the racial group you're most likely to get past the hate filter on is.... Native Americans. Now, while there's certainly a long history of hateful rhetoric against them, in modern discourse there's far less. You won't see crank internet trolls trying to be edgy by adopting hateful rhetoric towards Native Americans - their ire tends to be focused on Jews and Blacks, which OpenAI is very sensitive towards.



This is essentially the same argument used to justify the racism that models will happily regurgitate if not trained explicitly to not do so. In the end “We hold these truths to be self evident, that all men are created equal” is an axiom that you must elect to believe, not solved backwards from population statistics.


I generally don't see racism "justified" in models but rather explained. And the explanation is always bad training data / gaps in training data. I am suggesting you train/modify it to cover training data deficiencies so you treat all hateful content with equal discretion.

But keep in mind even the approach you recommend carries its own US-centric bias. There are lots of countries that would want open racism against some groups but not others and dismiss your desires for egalitarian treatment as American arrogance.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: