Funny, everyone is trying to do the same thing. IMHO no one nailed it yet.

vivzkestrel · 2025-12-31T12:06:37 1767182797

that is because they are all using non deterministic approaches, aka expecting that a single detailed prompt with 10000 words is going to generate a stable application. Because prompts dont have replay value, you have to split it into one microtask per agent and validate the output with deterministic fallback as and when required.