I'm in that camp -- I have the max-tier subscription to pretty much all the services, and for now Codex seems to win. Primarily because 1) long horizon development tasks are much more reliable with codex, and 2) OpenAI is far more generous with the token limits.
Gemini seems to be the worst of the three, and some open-weight models are not too bad (like Kimi k2.5). Cursor is still pretty good, and copilot just really really sucks.
I've done this kind of thing many times with codex and sqlite, and it works very well. It's one prompt that looks something like this:
- inspect and understand the downloaded data in directory /path/..., then come up with an sqlite data model for doing detailed analytics and ingest everything into an sqlite db in data.sqlite, and document the model in model.md.
Then you can query the database adhoc pretty easily with codex prompts (and also generate PDF graphs as needed.)
I typically use the highest reasoning level for the initial prompt, and as I get deeper into the data, continuously improve on the model, indexes, etc., and just have codex handle any data migration.
Expiries are a defence-in-depth that exist primarily for crypt hygiene, for example to protect from compromised keys. If the private key material is well protected, the risk is very low.
However, an org (particularaly a .mil) not renewing its TLS certs screams of extreme incompetence (which is exactly what expiries are meant to protect you from.)
> Also MCP is very obviously dead, as any of us doing heavy agentic coding know.
As someone that does heavy agentic coding (using basically all the tools), this is so far from the truth. People claiming this have probably never worked in large enterprise environments where things like authentication, RBAC, rate limiting, abuse detection, centralized management/updates/ops, etc. are a huge part of the development and deployment workflow.
In these situations you can't just use skills and cli tools without a gigantic amount of retooling and increased operational and security complexity. MCP is really useful here, and allows centralized eng and ops teams to manage their services in a way that aligns with the organizations overall posture, policies, and infrastructure.
> Google is so far behind agentic cli coding. Gemini CLI is awful.
This part I totally agree. It's really hard to express how bad it is (and it's really disappointing.)
> you can't just use skills and cli tools without a gigantic amount of retooling and increased operational and security complexity
You're describing MCP. After all, MCP is just reinventing the OpenAPI wheel. You can just have a self-documenting REST API using OpenAPI. Put the spec in your context and your model knows how to use it. You can have all the RBAC and rate limiting and auth you want. Heck, you could even build all that complexity into a CLI tool if you want. MCP the protocol doesn't actually enable anything. And implementing an MCP server is exactly as complex as using any other established protocol if you're using all those features anyway
Ya, if you just use OpenAPI. That's why I'm saying MCP adds nothing. It's just another standard for documenting APIs. There are many that have been around for a long time and that are better integrated with existing ecosystems. There's also gRPC reflection. I'm sure there are others. LLMs can use them all equally effectively.
Given MCP is supposed to just be a standardised format for self-describing APIs, why are all the features you listed MCP related things? It sounds more like it's forced the enterprise to build such features which cli tooling didn't have?
mostly by virtue of being a common standard. MCP servers are primarily useful in a remote environment, where centralized management of cross-cutting concerns matters. also its really useful for integrating existing distributed services, e.g., internal data lakes.
I think it's clear a self-describing CLI is optimal for local-first tooling and portability. I personally view remote MCP servers as complementary in the space.
> At some point you need to treat people as adults, which includes letting them make very bad decisions if they insist on doing so.
The world does not consist of all rational actors, and this opens the door to all kinds of exploitation. The attacks today are very sophisticated, and I don't trust my 80-yr old dad to be able to detect them, nor many of my non-tech-savvy friends.
> any more than it would be acceptable for a bank to tell an alcoholic "we aren't going to let you withdraw your money because we know you're just spending it at the liquor store".
It's not a false equivalence at all. Both situations are taking away someone's control of something that they own, borne from a paternalistic desire to protect that person from themselves. If one is acceptable, the other should be. Conversely if one is unacceptable, the other should be unacceptable as well. Either paternalistic refusal to let people do as they wish is ok, or it isn't.
Maybe not, but I think that overextending any idea like that in the opposite direction of whatever point you are trying to make at least devolves into a "slippery slope" argument. For instance, is your point that all security on phones that impede freedom of the user (for instance, HTTPS, forced password on initial startup, not allowing apps to access certain parts of the phone without user permissions, verifying boot image signatures) should be removed as well?
No, that's not my point at all. Measures such as that are a tool which is in the hands of the user. There is a default restriction which is good enough for most cases, but the user has the ability to open things up further if he needs. What Google is proposing takes control out of the user's hands and makes Google the sole arbiter of what is and is not allowed on the device.
None of the measures I mentioned are changeable by the user, except possibly sideloading an HTTPS certificate. That's the only way any of those measures even work; if it wasn't set as invariants by the OS, they would be bypassable.
>There is a default restriction which is good enough for most cases, but the user has the ability to open things up further if he needs.
But this is what the other guy's point is. You are defining "good enough for most cases" in a way that he is not, then making the argument that what he says is equivalent to not allowing an alcoholic to buy beer. Why can you set what level is an acceptable amount of restriction, but he can't?
The alcoholic knows the bad outcomes, and chooses to ignore them. The hapless Android user does not understand the negative consequences of sideloading. I think this makes for a substantial differerence between those two.
> The hapless Android user does not understand the negative consequences of sideloading.
Then make sideloading disabled by default but enable it when the users tap 7 times on whatever settings item. At that time, explain those "negative consequences" to them, explain them real good, don't spare anything and if they still hit "Yes, continue to enable sideloading" you do that immediately in order to avoid increasing their haplessness with other made-up excuses.
I don't see how people are against this. Especially tech-savvy people who browse HN. It really seems to me like everyone here who's on Google's side is just a bot in a botfarm somewhere. they can't possibly be real
Protecting from scams isn't protection from the victim themselves. That should be obvious from the fact that very intelligent and technologically literate people too can fall for phishing attacks. Tell me for example, how many people in your life know how a bank would ACTUALLY contact you about a suspected hijacking and what the process should look like? And how about any of the dozens of other cover stories used? Not to mention the situations where the scammers can use literally the same method of first contact as the real thing (eg. spoofed).
...And the fact that for example email clients do their best to help them by obscuring the email address and only showing the display name, because that's obviously a good idea.
> Protecting from scams isn't protection from the victim themselves.
That is where we differ. It is, ultimately, the victim of a scam who makes the choice of "yes, this person is trustworthy and I will do what they say". The only way to prevent that is to block the user from having the power to make that decision, which is to say protecting them from themselves.
But the proposal here, requiring developers to register their identities, doesn't actually impact consumers at all. They still have the ability to make the decision about whether or not to trust someone.
Yes it does, especially when you remember the fact that developers are also consumers. But even if they (we) weren't, it would still impact consumers. I, android user who's completely ignorant when it comes to android development or even mobile in general, would be heavily impacted by this.
My custom youtube clients would never be approved by google. My (free) apps for watching anime and reading manga would never get approved by Google.
And something that's approved today could stop being approved tomorrow. it's up to Google / Microsoft / Apple to decide after all, they're the ones in control of our devices. If they stop liking my open-source ad-free minesweeper game, then I can't play it anymore. I'll have to download their bloated proprietary version with ads and a subscription to keep playing.
> My custom youtube clients would never be approved by google. My (free) apps for watching anime and reading manga would never get approved by Google.
Google isn't approving apps though. A developer provides identity verification and a set of apps (apk names & keys) they are responsible for. There is no verification process or approval from google. The entire process as outlined in https://developer.android.com/developer-verification is that you prove you own signing keys for an apk name.
None of these things requires "locking down phones." Every single thing you've mentioned can be done in a smarter way that doesn't involve "individuals aren't allowed to modify the devices they purchase."
What's bullshit about it? This is how TRNGs in security enclaves work. They collect entropy from the environment, and use that to continuously reseed a PRNG, which generates bits.
If you're talking "true" in the philosophical sense, that doesn't exist -- the whole concept of randomness relies on an oracle.
What PRNGs lack compared to TRNGs is security (i.e. preventing someone from being able to use past values to predict future values). It's not that they somehow produce statistically invalid results (e.g. they generate 3s more often than 2s or something). Unless they're very poorly constructed.
While LCGs are bad by themselves, they (together with Galois field counters, which have a large number of possible implementations, e.g. LFSRs, GFSRs, XorShift etc.) have some very desirable properties for a PRNG: known period, it is possible to make jumps through the sequence and it is possible to extract sub-sequences from it that are certain to not overlap, e.g. for a multithreaded simulation.
Because of this, the best non-cryptographic PRNGs are made from either a LCG or a GFC that ensures the properties mentioned above, together with a non-linear mixing function that scrambles the output, for much better statistical properties than a linear generator would have alone.
The good cryptographic RNGs have the same kind of structure, but where a one-way hash function or a block cipher function is used to scramble the output of a counter. The counter ensures in a simpler way the same properties as a LCG or GFC. A simple counter can be used here because the output mixing function is much more complex.
I don't think hardware random number generators are bullshit, but it's easy to overstate their importance. Outside of cryptography, there aren't a whole lot of cases that truly require that much care in how random numbers are generated. For the kind of examples the article opens with (web page A/B testing, clinical trials, etc.) you'll never have sample sizes large enough to justify worrying about the difference between a half-decent PRNG and a "true" random number generator.
It's not a waste of time, it's a responsibility. All things need steering, even humans -- there's only so much precision that can be extrapolated from prompts, and as the tasks get bigger, small deviations can turn into very large mistakes.
There's a balance to strike between micro-management and no steering at all.
Most prompts we give are severely information-deficient. The reason LLMs can still produce acceptable results is because they compensate with their prior training and background knowledge.
The same applies to verification: it's fundamentally an information problem.
You see this exact dynamic when delegating work to humans. That's why good teams rely on extremely detailed specs. It's all a game of information.
Having prompts be information deficient is the whole point of LLMs. The only complete description of a typical programming problem is the final code or an equivalent formal specification.
I've used both gVisor and microvms for this (at very large scales), and there are various tradeoffs between the two.
The huge gVisor drawback is that it __drastically_ slows down applications (despite startup time being faster.)
For agents, the startup time latency is less of an issue than the runtime cost, so microvms perform a lot better. If you're doing this in kube, then there's a bunch of other challenges to deal with if you want standard k8s features, but if you're just looking for isolated sandboxes for agents, microvms work really well.
It seems to work with OpenCode, but I can't tell exactly what's going on -- I was super impressed when OpenCode presented me with a UI to switch the view between different sub-agents. I don't know if OpenCode is aware of the capability, or the model is really good at telling the harness how to spawn sub-agents or execute parallel tool calls.
Gemini seems to be the worst of the three, and some open-weight models are not too bad (like Kimi k2.5). Cursor is still pretty good, and copilot just really really sucks.
reply