More

robbomacrae · 2026-05-31T16:11:51 1780243911

I believe it's the way the HN algorithm works. In order to give new and obscure posts a shot, it will add them to peoples feeds in their front page and see how they measure. Otherwise new posts wouldn't get seen and the flywheel would never get started.

So everyone acts as a sort of beta tester for obscure posts.

robbomacrae · 2026-05-14T06:54:58 1778741698

I'm trying to do this with orcabot.com

A figma like dashboard for turning ClaudeCode, Gemini Cli, Codex into an OpenClaw but with security measures to break the lethal trifecta while running on a VM.

But it's not quite there in terms of usability. I agree that is the hardest part of the equation. It's something I'm constantly experimenting with and haven't found the solution to it yet. Open to feedback!

robbomacrae · 2026-05-03T09:25:21 1777800321

I don't think that is entirely fair.. I don't see them stating anywhere they are measuring coding capabilities... "Using complex games to probe real intelligence."

And this seems very much in line with the methodology in ARC-AGI-3.

The results here, in the OP article and in https://www.designarena.ai all tell a similar story: Kimi K2.6 is up and in the SOTA mix.

tgv · 2026-05-03T10:33:44 1777804424

The task was writing a "bot" to play the game. The title is "Kimi K2.6 just beat Claude, GPT-5.5, and Gemini in a coding challenge." How does that not imply measuring coding capabilities?

robbomacrae · 2026-04-24T16:26:48 1777048008

You could have the actual output of the agent turned into TTS using the model of your choice with TalkiTo… or listen to whatever weird sounds this makes. Seems like this is copying that viral Mac moan app. 2026 is weird.

robbomacrae · 2026-04-17T09:50:25 1776419425

You sound like you’ve never been disdainfully stared at by a cat..

Really interesting article though. I’m very hopeful AI can help work out how all these things interact.

robbomacrae · 2026-04-12T11:53:34 1775994814

"Let's be honest here: there is no benefit to alcohol (for example wine) and is only detrimental." - That is a pretty extreme statement and easily falsifiable.

There are many studies a quick google away that show a much more nuanced take ie [0] and [1]. But the strongest evidence is our most successful societies and civilizations have been intentionally drinking alcohol for ~10000 years [2]. If it was only detrimental then I'm pretty sure it would have worked its way out by now. I acknowledge there are negative issues.

[0]: https://www.webmd.com/diet/ss/slideshow-skinny-cocktails [1]: https://nutritionsource.hsph.harvard.edu/healthy-drinks/drin... [2]: https://en.wikipedia.org/wiki/History_of_alcoholic_beverages

thebigspacefuck · 2026-04-13T14:26:05 1776090365

Your fist link is to “10 skinny cocktails”. As far as I know there is no safe amount of alcohol.

https://www.who.int/europe/news/item/04-01-2023-no-level-of-...

robbomacrae · 2026-04-08T05:58:03 1775627883

OrcaBot does this with the VM but whereas the author mentions the risk of GitHub keys being leaked, OrcaBot uses a key broker to ensure the LLM doesn’t have access to any keys. It even works on the API keys to the LLMs themselves. https://orcabot.com/blog#breaking-the-lethal-trifecta

robbomacrae · 2026-03-10T07:36:36 1773128196

Genuinely surprised they didn't try to get away with department of peace.

robbomacrae · 2026-02-25T08:33:15 1772008395

Really cool. A tangential task that seems to be coming up more and more is masking sensitive data in these calls for security and privacy. Is that something you considered as a feature?

mksglu · 2026-02-25T08:55:35 1772009735

Good question.

The SQLite database is ephemeral — stored in the OS temp directory (/tmp/context-mode-{pid}.db) and scoped to the session process. Nothing persists after the session ends. For sensitive data masking specifically: right now the raw data never leaves the sandbox (it stays in the subprocess or the temp SQLite store), and only stdout summaries enter the conversation. But a dedicated redaction layer (regex-based PII stripping before indexing) is an interesting idea worth exploring. Would be a clean addition to the execute pipeline.

virgilp · 2026-02-25T09:17:40 1772011060

> Nothing persists after the session ends.

Does that mean that if I exit claude code and then later resume the session, the database is already lost? When exactly does the session end?

mksglu · 2026-02-25T09:18:25 1772011105

Yes — the database is tied to the MCP server process, so it's created fresh on each claude launch and lost when you exit; resuming a session starts a new process with a new empty database.

robbomacrae · 2026-02-25T08:16:34 1772007394

“so the API use is in addition to the subscription, but it can't be helped.” - I beg to differ. OrcaBot.com is a claws that runs using vanilla Claude Code so you can do all that with your regular subscription. Disclosure: I’m the author. The only reason these other claws can’t offer that is because they front it with their own AI.

arjie · 2026-02-25T08:48:26 1772009306

That's pretty cool. And when I first tried this, I tried to do it with a bash loop around `claude -p` and you can get quite far with that! But overall, I think I'd rather use their tools the way they've set them up to be used and pay them their $500/month total or whatever. I'm probably going to stick to this approach, but your thing is pretty neat so thank you for sharing.

ffb7c5 · 2026-02-25T08:50:13 1772009413

Just a heads up, I tried to use the continue with google button on your site, but running into "Bot verification failed". Using stock chrome browser, not running a VPN either

robbomacrae · 2026-02-25T08:56:17 1772009777

Thanks for mentioning that. The bot filter has been causing trouble so I def need to go and look at it. Debated disabling it but any basic bot that starts a dashboard is spinning up a VM I pay for! Changing browser might be a workaround?

arjie · 2026-02-25T08:54:14 1772009654

Seems like a Recaptcha failure. FWIW, I was able to sign in and everything. I didn't actually use the service though.