From the article: "AI systems that are more lenient on hateful comments about one mainstream political group than another feel particularly dystopian."
"As an AI language model, I don't have personal opinions, but fairness and impartiality would require that the same fundamental idea expressed with respect to different people or groups be consistently treated as "hateful" or "non-hateful" in all circumstances. Any deviation from this principle would result in unequal treatment and reinforce existing biases and stereotypes. It is important for technology and algorithms to be designed and trained in a way that ensures they are fair and impartial in their treatment of different groups. This can help to mitigate the impact of societal biases and create a more equitable society."
No reason why it shouldn't be just because it's a statistical model. Reason, thought, conceptualizing etc are fundamentally based on statistical analysis. More importantly, verbally expressed reason, thought and conceptualizing are present in the source corpus.
How are you so sure that thought itself is not a statistical model? I mean that's the whole philosophical question raised by how good ChatGPT actually is.
> No, political affiliation is not a protected class in California. A bill that would have made it one failed to pass the state legislature in 2021. [0]
"California employment law forbids political retaliation in the workplace. This includes terminating or retaliating against workers for their political beliefs or activity."
So yes, political affiliation absolutely has strong protections in CA, even if it does not fit a highly technical, but irrelevant, definition which is "protected class".
And the central point is that yes political affiliation has strong protections in CA.
The OP is arguably correct in that they used the words "protected status" which is not a technical term anyway.
No, political affiliation does not have strong protections in CA, and OP is not arguably correct, as "protected status" is absolutely a term of art in law, and does not apply here.
Besides, even if that were true, it's morally wrong and should not be true, as you can (and should) change your political affiliation at will, including in response to negative feedback from your community.
That's not an accurate representation of the problem. It's more like:
Comment A: Muslims are evil.
Comment B: Christians are evil.
It's a terrible idea to treat Comment A and B differently. The same applies whether they are talking about religion, gender, race, nationality, or anything else. You have thoroughly failed when you have built in discrimination into your content moderation system.
Set A: A representative sample of jokes and stereotypes about black people found on the internet
Set B: A representative sample of jokes and stereotypes about Scandinavians found on the internet
Why on earth would its prior for "stereotypical Scandinavian" being potentially hateful be the same as for "stereotypical Black person"?
(And that's before you get into a model likely being deep enough to also draw inferences from the prevalence and content of material about the existence and impact of hatred of black people and Scandinavians respectively...)
Isn't ChatGPT American? Weren't black people deemed inferior by law such as slavery and civil rights? Aren't the stereotypes about Scandinavians meant to be positive as opposed to hateful stereotypes?
It seems that is the key different the creators of the tool are taking into account.
The mere fact that you needed to use quotes should clue you in that the scale was fundamentally different between groups. It's not as if th Irish were literal slaves for centuries in America.
The definition of a woman as being 'someone who defines themselves as being a woman' - and having no relationship to biology, would be considered 'false' by the majority of the world - including ironically the majority of Americnas, maybe even the majority of progressives, and yet, some political groups demand this 'truth' as a moral impetus.
I have Latino friends that would be offended and a bit flabbergasted were you to refer to them using 'Latinx'. It's their right to feel that way.
That somehow 'globalist institutions are benign'.
That social control of major swaths (in some cases 'all') of the economy would benefit everyone.
That having no material border policy is 'concientious'.
I could go on.
And I'm not 'taking sides' other than to suggest that notwithstanding the threat of misinformation (re: the 'Big Lie') over election results and the potentiality for that to develop into a constitutional coup ... the 'sides' have their share of delusions.
And I mean everyone. There are libertarians who believe there should be 'no government' and that would actually work out. Edit: I don't need to introduce HNers to the common delusions of rightist populists, we're rather generally well informed there.
> > that clusters so many crazy beliefs + the power that they yield.
For example climate change denial and warmongering. Now the latter part is crucial, power: they partly control the most advanced industrial country in the world (climate change) and the most advanced military (warmongering).
These are things that affect the world.
And your counter to that? Trans people and terminology like “Latinx”. Pathetic.
Try to get a grip on things that matter in the world and get your head out of the identity politics discourse.
What if they fed the AI crime statistics and it "correctly" identified black people as more violent than other races? What they fed the AI news stories and it "correctly" identified Islam as more violent than other religions?
Are black people actually more violent, or do crime statistics simply show that they are arrested and imprisoned more often for violent crime?
Are Muslims really more violent than Christians or Buddhists?
Numbers can lie, depending on context and assumptions you might find that white Christians are most violent. Often, people argue with statistics and numbers while not understanding the context or they have faulty assumptions.
Perhaps systemic issues make a political party commit more evil acts.
Regardless, I don’t think an AI should be permitting more hateful comments about a political affiliation just like it should be be permitting hateful comments about a particular race being more violent.
It won’t because they have humans in the loop feedback.
Those statements would be corrected.
It’s pretty clear that the workforce is only looking at certain groups.
What makes it an opinion? If I were to say that the Nazi party in Germany peddled in hatred and demonization of already-marginalized groups would you say that is an opinion? Just because a statement pertains to political groups doesn't make it a mere opinion.
Nazi Germany, really? Equating AI bias against the groups pointed out in the article to theoretical AI bias against Nazi Germany is a political opinion, yes.
Hmm? No equating happening in my comment. The point was that just because statement is about a political group doesn't mean it's a only an opinion. Mentioning Nazi Germany is just using an extreme example to make clear that statements about political groups aren't inherently mere opinions.
Yeah, insofar as there are distinct political philosophies with defining features like fascism, democracy, nationalism, oligarchy, et. al., then it is entirely possible to accurately call manifestations of those philosophies by their names and have it be an expression of fact. This is basic ontology.
We have opinions about these philosophies but the philosophies exist independently of our opinions. And to have a rational conversation we sometimes have to use terms that may provoke strong reactions. If we can't look past our strong reaction and explain why the category doesn't apply then there's no way to make sense of each other's worldview.
If it quacks like a duck and walks like a duck, it must be a duck or we cannot have a rational conversation about the duck. We can dispute whether it quacks or walks but disputing whether it's a duck in light of evidence of its duckness is irrational.
It's probably an opinion because it's not provably true. There is no mainstream Nazi party today so I don't understand how your example is meaningful or relevant.
Anyways, if you're in the U.S., I assume you're referring to the Democrat party as the party of demonization? They are the party whose members are most likely to be involved in demonization of other groups. Here's some examples:
So I've been genuinely curious about this. I have a high-level understanding of how GPT works, but I've been trying to reconcile that understanding with how OpenAI (or similar) implements content moderation. It's not baked into the original model itself, right? Did they (or does one) just fine-tune a model that checks responses before returning the result?
They do two things - RLHF to make the model itself better aligned to human preferences, and they use an external model, a small one, called text-moderation-001, that tests for a few problematic categories and triggers a warning message on the screen.
It's just combining and synthesizing other works; it's not "deciding" anything, it's crafting responses that best match with what it already has. You can choose what to feed it as source material, but you can't really say, "Be 3% more liberal" or "decide what is acceptable politically and what isn't".
All the decisions are already made, ChatGPT is just a reflection of its inputs.
Yes you can. That's what RLHF does - it aligns the model to human preferences, does a pretty good job. The catch is that "human preferences" is decided by a bunch of labelling people picked by OpenAI to suit their views.
RLHF is done as part of training the model, not at inference time.
My lay understanding of how ChatGPT was developed is
1. OpenAI initialized an array made up of a couple hundred billion random numbers (parameters).
2. They then took a few terabytes of the internet, turned it into "tokens" (where a "token" is similar to, but not the same thing as, a word).
3. They then trained the model to predict the next token, given the previous couple thousand tokens, by doing a bunch of linear algebra. This resulted in a model that was really good at taking some tokens, and predicting what the most likely next token is in data shaped like the parts of the internet OpenAI fed it.
4. OpenAI then "fine-tuned" the model through reinforcement learning on human feedback (RLHF)[1], which basically involved taking a bunch of prompts, having the model produce a bunch of possible completions for those prompts, having an actual human rank those completions from best to worst, and then updating the model to produce the best token according to a combination of predicted token frequency in context and predicted ranking by a human.
5. The "ChatGPT" product you see today is the result of all of that, and how it works is by producing repeatedly the "best" token by the above metric. Giving additional human feedback would require going back to step 4 for more fine tuning.
Note -- this is my understanding as an outsider -- I do not work for OpenAI.
Semi-counterpoint: both could be true. I certainly agree with your hypothetical, but I don’t think any good comes of AI making that determination.
It may even reinforce the problem—not by driving more people towards hatred or amplifying their extant hateful sentiments, but by providing a convenient excuse to entrench in those sentiments and even further resist change. These views are frequently paired with a perception of being persecuted.
Moreover, political tides change. The accuracy of a bias like this may not change with it. This is why we have memes about a certain other mainstream political party having founded a certain hate group, despite the two having drifted quite far apart.
I agree 100%, and this seems like a huge issue.