It’s not about whether they realize and try to jailbreak (my comment was about *...

Hizonner · on April 13, 2024

If you apply the same precautions to code generated by the LLM as you would have applied to code generated directly by the user, then you no longer need to rely on the LLM not being jailbroken. On the other hand, if the LLM can put ANYTHING in its output that you can't defend against, then you have a problem.

Would you be comfortable with letting the user write that JSON directly, and relying ONLY on your schemas and regular expressions? If not, then you are doing it wrong.

... as people who try to sanitize input using regular expressions usually are...

[On edit: I really should have written "would you be careful letting the prompt source write that JSON directly", since not all of your prompt data are necessarily coming from the user, and anyway the user could be tricked into giving you a bad prompt unintentionally. For that matter, the LLM can be back-doored, but that's a somewhat different thing.]