Well, yes, that's Claude Code. And OpenAI Codex. And Google Gemini CLI. Your ave...

janee · 2025-09-30T04:30:08 1759206608

Yes but you need to setup quite a bit of tooling to provide feedback loops.

It's one thing to get an llm to do something unattended for long durations, it's a other to give it the means of verification.

For example I'm busy upgrading a 500k LoC rails 1 codebase to rails 8 and built several DSLs that give it proper authorised sessions in a headless browser with basic html parsing tooling so it can "see" what affect it's fixes have. Then you somehow need to also give it a reliable way to keep track of the past and it's own learnings, which sound simple but I have yet to see any tool or model solve it on this scale...will give sonnet 4.5 a try this weekend, but yeah none of the models I tried are able to produce meaningful results over long periods on this upgrade task without good tooling and strong feedback loops

Btw I have upgraded the app and taking it to alpha testing now so it is possible

majortennis · 2025-10-01T14:28:20 1759328900

I've tried asking it to log every request and response to a project_log.md but it routinely ignores that.

I've also tried using playwright for testing in a headless browser and taking screenshots for a blog that can effectively act as a log , it just seems like too tall an order for it.

It sounds like you're streets ahead of where I am could you give me some pointers on getting started with a feed back loop please

grncdr · 2025-09-30T06:52:05 1759215125

> rails 1 codebase to rails 8

A bit off topic, but Rails *1* ? I hope this was an internal app and not on the public internet somewhere …

janee · 2025-09-30T09:04:21 1759223061

haha no it's an old (15years old) abandoned enterprise app running on-prem that hasn't seen updates in more than a decade.

sarchertech · 2025-09-30T11:22:59 1759231379

Wow Rails 3 came out 15 years ago, so that thing started life out of date.

lsaferite · 2025-10-01T13:21:34 1759324894

> enterprise app

> started life out of date

That tracks my experiences.

ewoodrich · 2025-09-29T21:39:22 1759181962

But then that goes back to the original question, considering my own experiences observing the amount of damage CC or Codex can do in a working code base with a couple tiny initial mistakes or confusion about intent while being left unattended for ten minutes, let alone 30 hours....

sigmoid10 · 2025-10-02T07:39:36 1759390776

If you had used any of those, you'd know they clearly don't work well enough for such long tasks. We're not yet at the point where we have general purpose fire-and-forget frameworks. But there have been a few research examples from constrained environments with a complex custom setup.