Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think it’s perfectly clear: the model must know it’s been tampered with because it reports tampering before it reports which concept has been injected into its internal state. It can only do this if it has introspection capabilities.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: