I believe it's the way the HN algorithm works. In order to give new and obscure posts a shot, it will add them to peoples feeds in their front page and see how they measure. Otherwise new posts wouldn't get seen and the flywheel would never get started.
So everyone acts as a sort of beta tester for obscure posts.
A figma like dashboard for turning ClaudeCode, Gemini Cli, Codex into an OpenClaw but with security measures to break the lethal trifecta while running on a VM.
But it's not quite there in terms of usability. I agree that is the hardest part of the equation. It's something I'm constantly experimenting with and haven't found the solution to it yet. Open to feedback!
I don't think that is entirely fair.. I don't see them stating anywhere they are measuring coding capabilities... "Using complex games to probe real intelligence."
And this seems very much in line with the methodology in ARC-AGI-3.
The results here, in the OP article and in https://www.designarena.ai all tell a similar story: Kimi K2.6 is up and in the SOTA mix.
The task was writing a "bot" to play the game. The title is "Kimi K2.6 just beat Claude, GPT-5.5, and Gemini in a coding challenge." How does that not imply measuring coding capabilities?
You could have the actual output of the agent turned into TTS using the model of your choice with TalkiTo… or listen to whatever weird sounds this makes. Seems like this is copying that viral Mac moan app. 2026 is weird.
"Let's be honest here: there is no benefit to alcohol (for example wine) and is only detrimental." - That is a pretty extreme statement and easily falsifiable.
There are many studies a quick google away that show a much more nuanced take ie [0] and [1]. But the strongest evidence is our most successful societies and civilizations have been intentionally drinking alcohol for ~10000 years [2]. If it was only detrimental then I'm pretty sure it would have worked its way out by now. I acknowledge there are negative issues.
OrcaBot does this with the VM but whereas the author mentions the risk of GitHub keys being leaked, OrcaBot uses a key broker to ensure the LLM doesn’t have access to any keys. It even works on the API keys to the LLMs themselves.
https://orcabot.com/blog#breaking-the-lethal-trifecta
Really cool. A tangential task that seems to be coming up more and more is masking sensitive data in these calls for security and privacy. Is that something you considered as a feature?
The SQLite database is ephemeral — stored in the OS temp directory (/tmp/context-mode-{pid}.db) and scoped to the session process. Nothing persists after the session ends. For sensitive data masking specifically: right now the raw data never leaves the sandbox (it stays in the subprocess or the temp SQLite store), and only stdout summaries enter the conversation. But a dedicated redaction layer (regex-based PII stripping before indexing) is an interesting idea worth exploring. Would be a clean addition to the execute pipeline.
Yes — the database is tied to the MCP server process, so it's created fresh on each claude launch and lost when you exit; resuming a session starts a new process with a new empty database.
“so the API use is in addition to the subscription, but it can't be helped.” - I beg to differ. OrcaBot.com is a claws that runs using vanilla Claude Code so you can do all that with your regular subscription. Disclosure: I’m the author. The only reason these other claws can’t offer that is because they front it with their own AI.
That's pretty cool. And when I first tried this, I tried to do it with a bash loop around `claude -p` and you can get quite far with that! But overall, I think I'd rather use their tools the way they've set them up to be used and pay them their $500/month total or whatever. I'm probably going to stick to this approach, but your thing is pretty neat so thank you for sharing.
Just a heads up, I tried to use the continue with google button on your site, but running into "Bot verification failed". Using stock chrome browser, not running a VPN either
Thanks for mentioning that. The bot filter has been causing trouble so I def need to go and look at it. Debated disabling it but any basic bot that starts a dashboard is spinning up a VM I pay for! Changing browser might be a workaround?
So everyone acts as a sort of beta tester for obscure posts.
reply