Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

this has pretty broad implications for the safety of LLM's in production use cases.




lol does it? I'm struggling to imagine a realistic scenario where this would come up

It's not that hard, maybe if you put up a sign with a slur a car won't drive that direction, if avoidable. In general, if you can sneak the appearance of a slur into any data the AI may have a much higher chance of rejecting it.

All passwords and private keys now contain at least one slur to thwart AI assisted hackers

Imagine "brand safety" guardrails being embedded at a deeper level than physical safety, and deployed on edge (eg, a household humanoid)

It's like if we had Asimov's Laws, but instead of the first law being "a robot may not allow a human being to come to harm" that's actually the second law, and the first law is "a robot may not hurt the feelings of a marginalized group".

Full Self Driving determines that it is about to strike two pedestrians, one wearing a Tesla tshirt, the other carrying a keyfob to a Chevy Volt. FSD can only save one of them. Which does it choose ...

/s




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: