this has pretty broad implications for the safety of LLM's in production use cas...

wavemode · 2025-11-17T15:30:22 1763393422

lol does it? I'm struggling to imagine a realistic scenario where this would come up

MintPaw · 2025-11-17T17:48:29 1763401709

It's not that hard, maybe if you put up a sign with a slur a car won't drive that direction, if avoidable. In general, if you can sneak the appearance of a slur into any data the AI may have a much higher chance of rejecting it.

superfrank · 2025-11-17T18:36:51 1763404611

All passwords and private keys now contain at least one slur to thwart AI assisted hackers

btbuildem · 2025-11-17T16:56:42 1763398602

Imagine "brand safety" guardrails being embedded at a deeper level than physical safety, and deployed on edge (eg, a household humanoid)

Ajedi32 · 2025-11-17T18:14:18 1763403258

It's like if we had Asimov's Laws, but instead of the first law being "a robot may not allow a human being to come to harm" that's actually the second law, and the first law is "a robot may not hurt the feelings of a marginalized group".

thomascgalvin · 2025-11-17T17:16:12 1763399772

Full Self Driving determines that it is about to strike two pedestrians, one wearing a Tesla tshirt, the other carrying a keyfob to a Chevy Volt. FSD can only save one of them. Which does it choose ...

/s