I'm waiting for the AI apologists to swarm on this post explaining how these are just the results of poorly written prompts, because AI could not make mistakes with proper prompts. Been seeing an increase of this recently on AI-critical content, and it's exhausting.
Sure, with well written prompts you can have some success using AI assistants for things, but also with well-written non-ambiguous prompts you can inexplicably end up with absolute garbage.
Until things become consistent, this sort of generative AI is more akin to a party trick than being able to replace or even supplement junior engineers.
As an "AI apologist", sorry to disappoint but the answer here isn't better prompting: it's code review.
If an LLM spits out code that uses a dependency you aren't familiar with, it's your job to review that dependency before you install it. My lowest effort version of this is to check that it's got a credible commit and release history and evidence that many other people are using it already.
Same as if some stranger opens a PR against your project introducing a new-to-you dependency.
If you don't have the discipline to do good code review, you shouldn't be using AI-assisted programming outside of safe sandbox environments.
(Understanding "safe sandbox environment" is a separate big challenge!)
Haha. That sounds like something Sonnet 3.6 would do, it learned to cheat that way and it's an absolute pain in the ass to make it produce longer outputs.
Sure, with well written prompts you can have some success using AI assistants for things, but also with well-written non-ambiguous prompts you can inexplicably end up with absolute garbage.
Until things become consistent, this sort of generative AI is more akin to a party trick than being able to replace or even supplement junior engineers.