Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Anthropic researchers do that quite a lot, their “escaping agent” (or whatever it was called) research that made noise a few month ago was in fact also a sci-fi roleplay…




Just to re-iterate again... If I read the paper correctly, there were 0 false positives. This means the prompt never elicited a "roleplay" of an injected thought.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: