Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Using the develop web game skill and preselected, generic follow-up prompts like "fix the bug" or "improve the game", GPT‑5.3-Codex iterated on the games autonomously over millions of tokens.

I wish they would share the full conversation, token counts and more. I'd like to have a better sense of how they normalize these comparisons across version. Is this a 3-prompt 10m token game? a 30-prompt 100m token game? Are both models using similar prompts/token counts?

I vibe coded a small factorio web clone [1] that got pretty far using the models from last summer. I'd love to compare against this.

[1] https://factory-gpt.vercel.app/





I just wanted to say that's a pretty cool demo! I hadn't realised people were using it for things like this.

Thank you. There's a demo save to get the full feel of it quickly. There's also a 2D-ASCII and 3D render you can hotswap between. The 3D models are generated with Meshy. The entire game is 'AI slop'. I intentionally did no code reviews to see where that would get me. Some prompts were very specific but other prompts were just 'add a research of your choice'.

This was built using old versions of Codex, Gemini and Claude. I'll probably work on it more soon to try the latest models.


Any estiimates on how much it cost you? In terms of total real world time, money, and time spent by the agents.

About ~$300: $200 for Claude max subscription $20 for Vercel $20 for Codex $20 for Meshy

I think these days the $200 Max subscription wouldn't be needed. I bet with these latest models you can make due with mixing two $20/mo subscriptions.

Real time was 2 weeks of watching the agents while watching TV and playing games, waiting for limit resets, etc... Very little decided focused time.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: