I'm not sure what you mean that it "forgot" about POST? Even as an experienced Go developer, I looked at the code and thought it would probably work for both GET and POST. I couldn't easily see a problem, yet I had not forgotten about POST being part of the request. It's just not an obvious problem. This is absolutely what I would classify as a "brain teaser". It's a type of problem that makes an interviewer feel clever, but it's not great for actually evaluating candidates.
Only on running the code did I realize that it wasn't doing anything to handle the problem of the request body, where it works on the first attempt, but the ReadCloser is empty on subsequent attempts. It looks like Phind-70B corrected this issue once it was pointed out.
I've seen GPT-4 make plenty of small mistakes when generating code, so being iterative seems normal, even if GPT-4 might have this one specific brain teaser completely memorized.
I am not at the point where I expect any LLM to blindly generate perfect code every time, but if it can usually correct issues with feedback from an error message, then that's still quite good.
This isn't a brain teaser at all. It's a direct test of domain knowledge/experience.
There are countless well-documented RoundTripper implementations that handle this case correctly.
This is the sort of thing you whip up in three minutes and move along. To me it seems like a perfect test of LLMs. I don't need an injection of something that's worse than stackoverflow polluting the code I work on.
Only on running the code did I realize that it wasn't doing anything to handle the problem of the request body, where it works on the first attempt, but the ReadCloser is empty on subsequent attempts. It looks like Phind-70B corrected this issue once it was pointed out.
I've seen GPT-4 make plenty of small mistakes when generating code, so being iterative seems normal, even if GPT-4 might have this one specific brain teaser completely memorized.
I am not at the point where I expect any LLM to blindly generate perfect code every time, but if it can usually correct issues with feedback from an error message, then that's still quite good.