More

dvfjsdhgfv · 2026-03-19T21:47:32 1773956852

> Since I discovered pi I cancelled my Claude subscription and subscribed to ChatGPT.

Sorry, what is pi and how are you using it with ChatGPT for agentic coding?

raincole · 2026-03-19T22:02:16 1773957736

This is a better introduction to it: https://mariozechner.at/posts/2025-11-30-pi-coding-agent/

I have nothing against Codex's cli or OpenCode. I just found pi is sufficient for me and it's easy to understand what's going on under the hood.

By ChatGPT I mean the subscription plan, not the web UI. I only use gpt-5.3-codex with pi.

joshstrange · 2026-03-19T21:51:52 1773957112

> Sorry, what is pi

https://github.com/badlogic/pi-mono/tree/main/packages/codin...

> how are you using it with ChatGPT for agentic coding?

OpenAI has publically blessed people using their subscriptions with different harnesses, like OpenCode and Pi.

wyre · 2026-03-19T22:13:40 1773958420

Pi.dev

Extensible coding agent written in typescript. It’s exactly what you (I’m projecting) want out of Claude Code if you’re okay investing time into building your harness or prompting an agent to build it.

dvfjsdhgfv · 2026-03-19T21:45:28 1773956728

Literally everyone is desperately trying to figure out why it's so bad and how to make it work consistently using harness etc. But in spite of this massive effort things always go awry after a while. Maybe in a year or two someone figures it out.

dvfjsdhgfv · 2026-03-18T20:49:40 1773866980

Frankly, I created dozen of such projects in the last weeks. Recently I just deleted them all. I feel like there's no point. I cancelled my Claude subscription, too.

I got back learning from books and use LLMs for "review my code in depth and show me its weak points" occasionally.

baq · 2026-03-18T21:42:37 1773870157

LLMs in teacher mode instead of solver mode can be great. ("review this change" is kinda sorta teacher mode.)

dvfjsdhgfv · 2026-03-15T21:12:52 1773609172

> - a side project to do rough cuts of family travel videos (https://usefirstcut.com, soft launch video: https://x.com/xitijpatel/status/2026025051573686429)

I can't comment about the quality of the code you delivered for your client so I checked your side project. Unfortunately it looks like there is only a landing page (very nice!) but the way from a vibe-coded project to production is usually quite long.

HorizonXP · 2026-03-15T22:20:31 1773613231

Not wrong at all, that’s why I’m building my own platform for this. That’s also why I haven’t publicly done much on First Cut yet. I’m using my platform to actually build the product, so the intent is that I use my expertise and oversight to ensure it’s not just slop code. So most of the effort has gone into building that platform, which has made building First Cut itself slower. But I’ve actually got my platform running well-enough that now my team is able to get involved, and I can start to work on First Cut again, which means that I should be able to answer your “concern” definitively. I share it.

dvfjsdhgfv · 2026-03-15T21:08:14 1773608894

> A smarter model would be great but there are bigger productivity gains to be had with a good set up, a faster model, and abstracting away the need to think about agents or context usage. I’m still figuring out a good set up. Something with the speed of Haiku with the reasoning of Opus without the overhead of having to think about the management of agents or context would be sweet.

I was thinking about this recently. This kind of setup is a Holy Grail everyone is searching for. Make the damn tool produce the right output more of the time. And yet, despite testing the methods provided by the people who claim they get excellent results, I still come to the point where the it gets off rails. Nevertheless, since practically everybody works on resolving this particular issue, and huge amounts of money have been poured into getting it right, I hope in the next year or so we will finally have something we can reliably use.

dvfjsdhgfv · 2026-03-15T21:01:59 1773608519

> If you aren't doing this level of work by now, you will be automated soon.

It's harder and harder to detect sarcasm these days but in case you're being serious, I've tested a similar setup and I noticed Claude produces perfectly plausible code that has very subtle bugs that get harder and harder to notice. In the end, the initial speedup was gone and I decided to rewrite everything by hand. I'm working on a product where we need to understand the code base very well.

rckclmbr · 2026-03-15T21:05:37 1773608737

I keep hearing “Claude creates subtle bugs”, but how is that different than people engineers? I’ve never worked in a bug free codebase

jplusequalt · 2026-03-15T21:51:42 1773611502

When you write the code yourself you are slowly building up a mental model of how said thing should work. If you end up introducing a subtle bug during that process, at least you already have a good understanding of the code, so it shouldn't be much of an issue to work backwards to find out what assumptions turned out to be incorrect.

But now with Claude, the mental model of how your code works is not in your head, but resides behind a chain of reasoning from Claude Code that you are not privy too. When something breaks, you either have to spend much longer trying to piece together what your agent has made, or to continue throwing Claude at and hope it doesn't spiral into more subtle bugs.

eeperson · 2026-03-15T21:29:18 1773610158

Everybody produces bugs, but Claude is good a producing code that looks like it solves the problem but doesn't. Developers worth working with, grow out of this in a new project. Claude doesn't.

An example I have of this is when I asked Claude to copy a some functionality from a front-end application to a back-end application. It got all of the function signatures right but then hallucinated the contents of the functions. Part of this functionality included a look up map for some values. The new version had entirely hallucinated keys and values, but the values sounded correct if you didn't compare with the original. A human would have literally copied the original lookup map.

tehjoker · 2026-03-15T21:42:58 1773610978

I asked claude to help me figure out some statistical calculation in Apple Numbers. It helpfully provided the results of the calculation. I ignored it and implemented it in the spreadsheet and got completely different (correct) results. Claude did help me figure out how to do it correctly though!

NewsaHackO · 2026-03-15T22:15:13 1773612913

> Developers worth working with, grow out of this in a new project. Claude doesn't.

There is no way this is true. People make fewer bugs with time and guidance, but no human makes zero bugs. Also, bugs are not planned; it's always easy to in hindsight say "A human would have literally copied the original lookup map," but every bug has some sort of mistake that is made that is off the status quo. That's why it's a bug.

habinero · 2026-03-16T09:08:19 1773652099

No, it's broadly true. Also, that's why we have code review and tests, so that it has to pass a couple of filters.

LLMs don't make mistakes like humans make mistakes.

If you're a SWE at my company, I can assume you have a baseline of skill and you tested the code yourself, so I'm trying to look for any edge cases or gaps or whatever that you might have missed. Do you have good enough tests to make both of us feel confident the code does what it appears to do?

With LLMs, I have to treat its code like it's a hostile adversary trying to sneak in subtle backdoors. I can't trust anything to be done honestly.

eeperson · 2026-03-16T16:57:25 1773680245

Sorry, perhaps I should have been clearer. They don't grow completely out of making bugs (although they do tend to make fewer over time), they grow out of making solutions that look right but don't actually solve the problem. This is because they understand the problem space better over time.

GeoAtreides · 2026-03-15T23:40:26 1773618026

simple: people produce subtle subtle bugs, LLMs produce obvious subtle bugs.

dvfjsdhgfv · 2026-03-15T18:59:18 1773601158

Man this is maddening.

dvfjsdhgfv · 2026-03-14T14:09:41 1773497381

For those confused about this submission: the original post is here:

https://web.archive.org/web/20260314105751/https://backnotpr...

dvfjsdhgfv · 2026-03-14T14:08:16 1773497296

Claude will happily generate tons of useless code and you will be charged appropriately. the output of LLMs has nothing to do with payment rates, otherwise you end up with absurdities like valuating useless CCC that was very expensive to build using LOCs as a metrics whereas in reality is a toy product nobody in their right mind would ever use.

raw_anon_1111 · 2026-03-14T14:22:14 1773498134

My metrics are really simple - I don’t do staff augmentation. I get a contract (SOW) with a known set of requirements and acceptance criteria.

The only metrics that matter is it done on time, on budget and meets requirements.

But if Claude Code is generating “useless code” for you, you’re doing it wrong

And I assure you that my implementations from six years of working with consulting departments/companies (including almost four as blue badge, RSU earning consultant at AWS ProServe) have never gone unused.

dvfjsdhgfv · 2026-03-14T14:04:50 1773497090

Can you share a setup that works for you? I found vanilla opencode vastly inferior to CC, I use it only for little toys like 3 small files that's all.

Havoc · 2026-03-14T18:30:38 1773513038

Don't think I'm doing anything particularly novel.

Using a mix of models - GLM5, MinMax 2.5 and Claude Sonnet/Opus - they find different issues

Spending fair bit of time in spec'ing things out and running all three models over it to suggest improvements / flaws & iterating till all three are happy. Same at end - look at code & suggest stability improvements. The actual writing code is GLM5 - once properly spec'd out it can generally just hammer away at it till its done

And doing a lot of microservice style architecture. Think chains of containers talking to each other over APIs