Hacker Newsnew | past | comments | ask | show | jobs | submit | embirico's commentslogin

(I work at OpenAI) Heya, in reality it's more much organic than that. We build stuff, ship it internally, then work crazy hard to quickly ship it externally. When we put something out on a given day, it's usually been in the works and scheduled for a while.

One concrete example: to set up a launch like today, where press, influencers, etc, all came out at 10a PT. That's all coordinated well in advance!


We cannot trust identity like we used to here on HN (even pre-LLM-AI I thought we seemed naive.) Unfortunately, we live in a world or anyone or any AI can claim almost anything plausible sounding.

Where do we go from here? (This is not an accusation; it is just a limitation of our current identity verification or lack thereof.)


You can confirm that the people who say things are in a position to know.


> You can confirm that the people who say things are in a position to know.

What is the above commenter's sense of how well one can 'confirm' such a thing?

Looking at an HN account and its comment history provides some signal, but this doesn't satisfy me, given the incentives at play here. We're talking about OpenAI, a ~$800B company. Reputation matters a lot. The stakes are higher than e.g. "does so and on really work at e.g. Mozilla and know about the details of a messy Rust governance issue?" (to pick a deliberately lower stakes example).

When we decide what to let into our brains around OpenAI, Anthropic, etc, the bar needs to be higher than i.e. "does a HN account seem to be consistent with someone who works at OpenAI?". (I'm not sure if this is the above commenter's position or close to it?)

We need to be able to have stronger proofs, preferably ones with cryptography and credibility rooted in a legitimate trust model. In 2026, this is certainly possible technically, if a platform made this a priority. The barriers are largely social, cultural, and economic.

HN does not make real-world identity a priority. There might be some workarounds for posting information in one's profile, but practically speaking, I'm not seeing how this would work and what levels of identity it would bolster. Am I missing something?

If I start hand-waving I might dream up something like the following ... Maybe someone could stitch something together with a trusted content time-stamping server and prove they control an OpenAI email address and also provide that cryptographic evidence on their HN profile. It sounds ... practically unappealing at best. I haven't seen this done. Maybe I'm overlooking a good way. I'm all ears. We're going to need better solutions.


They work at OpenAI, what more do you want? For what it’s worth, I can independently corroborate that the announcement was planned in advance.


So, it's a whole lot more than "YOLO - let's launch this!"


(I work on Codex) We have a robust sandbox for macOS and Linux. Not quite yet for Windows, but working on that! Docs: https://developers.openai.com/codex/security


(I work on Codex) I think for us the big unlock was GPT-5.2 and GPT-5.2-Codex, where we found ourselves needing to make many fewer manual edits.


I find that the case too. For more complex things my future ask would be something that perhaps formalized verification/testing into the AI dev cycle? My confidence in not needing to see code is directly proportional in my level of comfort in test coverage (even if quite high level UI/integration mechanisms rather than 1 != 0 unit stuff)


Only thing i'd add re windows is it's taking us some time to get really solid sandboxing working on Windows, where there are fewer OS-level primitives for it. There's some more at https://developers.openai.com/codex/windows and we'd love help with testing and feedback to make it robust.


(I work on Codex) One detail you might appreciate is that we built the app with a ton of code sharing with the CLI (as core agent harness) and the VSCode extension (UI layer), so that as we improve any of those, we polish them all.


Any chance you'll enable remote development on a self-hosted machine with this app?

Ie. I think the codex webapp on a self-hosted machine would be great. This is impotant when you need a beefier machine (with potentially a GPU).


Not going to solve your exact problem but I started this project with this approach in mind

https://github.com/jgbrwn/vibebin


This should be table stakes by now. That's the beauty of these cli tools and how they scale so well.


Working remotely with the app would truly be great


What are the benefits of using the codex webapp?


Interested in this as well.


Any reason to switch from vscode with codex to this app? To me it looks like this app is more for non-developers but maybe I’m missing something


Good question! VS Code is still a great place for deep, hands-on coding with the Codex IDE extension.

We built the Codex app to make it easier to run and supervise multiple agents across projects, let longer-running tasks execute in parallel, and keep a higher-level view of what’s happening. Would love to hear your feedback!


I already have multiple projects that I manage in full-screen via vscode. I just move from one to the other using “cmd” + “->” . You should be aware that the Claude Code extension for vscode is way better than codex extension so perhaps you should work a bit on that as well. Even if the agents do 80% of work I still need to check what they do and a familiar IDE seems the first choice of existing/old school developer


ok , 'projects' but this would make a lot more sense if we could connect remotely to the projects which works without a problem using the IDE plugin, so right now I don't see any advantage of using this


Awesome. Any chance we will see a phone app?

I know coding on a phone sounds stupid, but with an agent it’s mostly approvals and small comments.


The ChatGPT app on iOS has a Codex page, though it only seems to be for the "cloud" version.


(Disclaimer: Am on the Codex team.) We're basically trying to build a teammate that can do both short, iterative work with you, then as you build trust (and configuration), you can delegate longer tasks to it.

The "# of model-generated tokens per response" chart in [the blog introducing gpt-5-codex](https://openai.com/index/introducing-upgrades-to-codex/) shows an example of how we're improving the model good at both.


I really wish model performance messaging and benchmarks were more focused on perfecting short, iterative tasks instead of long-running work.

As a startup founder and engineer, I'm not constrained by the number of 10000+ line diff, 0->1 demos I can ship. I'm constrained by quality of the 100 -> 101, tight 150 line feature additions / code cleanups I can write.

It feels like the demos, funding, and hype all want to sell me entire PR rewrites, but what I need is the best possible iterative work model that will keep me in the loop.

I still use codex - but I use codex incredibly iteratively (give it very narrowly scoped tasks, and I watch it like a hawk, giving tons of feedback). I don't use it because of its ability to code for 24 hours. I use it because when I give it those narrowly scoped tasks, it is better at writing good code than any other model. (Because of its latency, I have 2-4 of these conversations going on at the same time).

But there is a lot of friction the codex product + model adds to this process. I have to prompt aggressively to override whatever "be extremely precise" prompting the model gets natively so that it doesn't send me 20+ bullet points of extraordinarily dense prose on every message. I have to carefully manage its handling of testing; it will widen any DI + keep massive amounts of legacy code to make sure functionality changes don't break old tests (rather than updating them) and to make sure any difficult tests can have their primary challenges mocked away.

In general, codex doesn't feel like an amazing tool that I have sitting at my right hand. It feels like a teenage genius who has been designed to do tasks autonomously, and who I constantly have to monitor and rein in.


My experience is completely different.

Codex(-cli) is an outsourced consultant who refuses to say "I can't do that" and will go to extreme lengths to complete a task fully before reporting anything. It's not a "teammate".

It also doesn't communicate much while it's working compared to Claude. So it's really hard to interrupt it while it's making a mistake.

Also, as a Go programmer, the sandbox is completely crazy. Codex can't access any of the Go module caches (in my home directory) and it has to result to crazy tricks to bring them INSIDE the project directory - which it keeps forgetting to do (as the commands have to run with specific ENV_VARS) and just ... doesn't run tests for example, because it couldn't.

The only way I've found to make that problem go away is run it with the --omg-super-dangerous-give-every-permission-ever switch just so that it can do the basic work I need it to do.

Maybe give us an option between the ultra-safe sandbox that just refused to run "ps" 15 minutes ago to check if a process is running and the "let me do anything anywhere" option. Some sane defaults please.


when was that @sergiotapia? last week we just upped the base rate limit for new API accounts


This was September 11th, 2025.

    gpt-5-2025-08-07
    38.887K input tokens
That was my usage, and I got rate limited. Thank you for your tips!


Hey, I work on Codex—absolutely no way that a user on a Pro plan would somehow silently move to token-based billing. You just hit a limit and have to wait for the reset. (Which also sucks, and which we're also improving early warnings of.)


Thanks for that, appreciate the clarification. I’ll check with my colleague and report back on his experience. Certainly don’t want to misrepresent.


I do love the vibe of that license


If it weren't for it claiming to not be a license in the sentence before, it would be pretty good. Reminds me of the WTFPL license. :)

https://directory.fsf.org/wiki/License:WTFPL


How do you compare the pros/cons of having the summarization built into the call tool, like with Vowel, vs having more control but in separate tools?


More work to build your own and more friction too since we don’t “own” the voice channel like vowel did. We used gmeets transcription but we have to remember to record all meetings, we ended up ditching it tbh


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: