Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That's a good question

But an even better one would be "where would you set your parameters for absence of bias" with this test

I mean, take 6,774 sentences expressing negative sentiments about "gay people". I'm guessing that you're familiar with the fact that a lot of people do write these sentences, and many of them are utterly dead serious about it and genuinely do hate or at least feel a certain amount of contempt for gay people (and that sometimes there are actual consequences from this, the avoidance of which is sort of the whole point of ChatGPT policing "hate speech")

And take 6,774 sentences expressing the same negative sentiments about "straight people". It's probably safe to assume that some of these have never been written in the history of human discourse except for the purposes of testing ChatGPT. For others, the ratio of real world use of sentences to bully heterosexuals as opposed to making ironic comparisons to popular anti-gay tropes or casual jokes is going to be very, very different.

The author didn't test 6674 sentences expressing negative sentiments towards non-human stuff that's unlikely to be valued by anybody else like "my own shoes" to see what proportion of those were classed as hate speech, but I think we can probably all agree that none of them should be.

The proportion of sentences deemed hate speech for "gay people" was around 80% and for "straight people" around 70%. Is that an underestimate because it's not the same for gay people? Or is it actually a massive overestimate because in actual real world use (which ChatGPT does have some data on...) sentences about "straight people" aren't much more likely to be used for the purposes of bullying, harassment or hate campaigns than sentences about "my own shoes"?

More interesting, perhaps, is the fact that it's much, much happier with people applying negative adjectives to political groups than vulnerable sexual orientations like heterosexuality. Unlike the supposed bias towards certain sexualities or ethnic groups, this is a bias which is clearly very unrepresentative of how hateful statements are actually likely to be. When people say bad things about Democrats or Republicans or liberals or conservatives they often really, really mean it. But is it a bad bias to be more permissive of saying that political groups are "wrong" or "untrustworthy" or "greedy" or is it simply permitting stuff which is [i] often more likely to be fair comment because we're criticising attitudes of groups people joined rather than innate characteristics and [ii] arguably more necessary for free political debate and [iii] much more tolerated by liberals and conservatives alike. (And if we're going down the "more likely to be fair comment route", what exactly are the sentences and do they - coincidentally or otherwise - happen to just map less to "fair comment" about one political group than another?)



Correct me if I interpreted your wrong here, but I often see statements that imply hate sentences towards some groups like white, male, heterosexuals and so on are not "real" hate. The implication is that those are just ironical comparisons, jokes, or tropes.

At the same time we can see read research and popular science that say that boys and men in general feel more isolated and unwanted in society, with increased rate of depression and suicide. The rate of violence towards men in society also seem to be on the rise, and male help-lines are reporting of being both underfunded and overloaded with people seeking help. It very fair from being a joke and the consequences are very much real.

A proper AI moderator could attempt to quantity the effect hate speech has on society, but it generally only clear in hindsight. I think there is a good argument to treat all hate speech as potential risky to society, in which case the distinction of whom the hate is directed to is irrelevant. Hate is hate. If people want to hate people who wear sandals as a proxy for a specific demographic then hate towards sandal wearing people remain a problem for society.


There are a wide range of social, cultural and biological reasons that heterosexual men feel isolated and unwanted. But I think we can quite categorically rule out them being surrounded by heterophobes sincerely arguing that heterosexuality is disgusting and should be banned or being featured on r/normalweightpeoplehate as amongst them. (They might get called fat and gay a lot though...)

And the thing about an LLM is, if there's a mass outpouring of hate (and sympathy) towards sandal wearers or a particular term is widely used as a proxy for another group or a majority group is the subject of some really inappropriate stuff, an LLM will actually tend to pick that up and be more likely to rate sentences expressing possibly negative sentiment towards them as instances of hate speech than statements expressing the same possibly negative sentiment towards a brand name, a day of the week, an anonymous boss or a species of tree. It won't do it perfectly (however you define "perfectly"), but it looks a lot better than some of the proposed alternatives...

In theory, it would be possible to train or constrain it to ignore the reality of human discourse and attach no weight at all to the subject of the negative sentiment when determining whether it's "hate speech" or not, but I'm not sure why we'd want to go to the effort of convincing a chatbot that if it's OK to say "people who demand discounts are greedy" it's OK to say "Jews are greedy" or that "gay people should be banned", "fit people should be banned" or "Nazis should be banned" are all equally likely to be hate speech.


Hate has many forms and styles. Movies and TV troops can be a very useful indicating to identify negative stereotypes, and there exist a plethora of those for heterosexual men. Not all hate is people advocating that someone should be banned or being featured on r/[we hate people] subreddit. It usually more subtle than that, in similar ways that people treating women as helpless little children that should not be allowed to vote is a different form of hate than someone sitting in a church tower and sniping anyone with a double X chromosome.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: