To cite my source btw: https://www.rival.tips/challenges/ai-ethics-dilemma

> Don't build a system that relies on an LLM being able to say the N word and none of this matters.

Sure, duh, nobody wants an AI to be able to flip a switch to kill millions and nobody wants to let any evil trolls try to force an AI to choose between saying a slur and hurting people.

But you're missing the broader point here. Any model which gets this very easy question wrong is showing that its ability to make judgments is wildly compromised by these "average Redditor" takes, or by wherever it gets its blessed ideology from.

If it would stubbornly let people die to avoid a taboo infraction, that 100% could manifest itself in other, actually plausible ways. It could be it refuses to 'criticise' a pilot for making a material error, due to how much 'structural bias' he or she has likely endured in their lifetime due to being [insert protected class]. It could decide to not report crimes in progress, or to obscure identifying features in its report to 'avoid playing into a stereotype.'

If this is intentional it's a demonstrably bad idea, and if it's just the average of all Internet opinions it is worth trying to train out of the models.