I'm looking forward to trying this. I've had a positive but high-variance experience with Gastown[1], which is in the same genre. I hope that Scion does better.
My main complaints with Gastown are that (1) it's expensive, partly because (2) it refuses to use anything but Claude models, in spite of my configuration attempts, (3) I can't figure out how to back up or add a remote to its beads/dolt bug database, which makes me afraid to touch the installation, and (4) upgrading it often causes yak shaving and lost context. These might all be my own skill issues, but I do RTFM.
But wow, Gastown gets results. There's something magic about the dialogue and coordination between the mayor and the polecats that leads to an even better experience than Claude Code alone.
I'm trialing it on very silly things, like a economic simulator game in Rust/Bevy. I put in an entire road map document with inline specs and goals, wild milestones, with tasks like "working bid/ask spread when factories buy or sell on the market to make pricing dynamic and realistic", "political entities can set work conditions", "international trade has pricing dynamics that take into account currency interchange and tariff rates", "infrastructure for trade improves as trade volumes increase across given tiles".
Out the other end over about 3-4 five-hour-sessions comes about 85% functional code for every single listed thing. I'd guess you'd be looking at a team for months, give or take, without the automation. Total cost was around $50 in VM time (not counting claude since I would be subscribed anyway) I'm not letting that thing anywhere near a computer I care about and rust compiles are resource intensive, so I paid for a nice VM that I could smash the abort button on if it started looking at me funny.
So I liken it to buying an enormous bulldozer. If you're a skilled operator you can move mountains, but there'll still be a lot of manual work and planning involved. It's very clearly directionally where the industry will go once the models are improved and the harnesses and orchestration are more mature than "30% of the development effort is fixing the harness and orchestration itself", plus an additional "20% of your personal time will be knocking two robots heads together and getting them to actually do work"
Edit: some more details of other knock on work - I asked for a complexity metadata field to automatically dispatch work to cheaper/faster models, set up harnesses to make opencode and codex work similarly to how claude works, troubleshot some bugs in the underlying gastown system. Gastown fork is public if you'd like to have a look.
>working bid/ask spread when factories buy or sell on the market to make pricing dynamic and realistic
Does it deliver on the "realistic" part? My experience with most models is they make something that technically fulfills the ask, but often in a way that doesn't really capture my intent (this is with regular Claude Code though).
Yep, garbage in garbage out, I had some additional specs beyond the summary above, everything requires refinement as well, but honestly I never thought I was going to have a simcity/civlike clone in a couple weekends that's reasonably playable.
We ended up adding workflows with deterministic paths, that can use RAW API calls, CLIs, and agents. I think that was a big differential.
We also added pi-mono, and started using more and more other models for different tasks (Gemini, K2.5, GLM-5, you name it).
I think the problem is that most are building solutions that rely in one provider, instead of focusing self learning capabilities on improving the cost-quality-speed ratio.
I made one similar harness, mine does lightweight sandboxing with Seatbelt on Mac and Bubblewrap on Linux. I initially used Docker too, but abandoned it. I like how these 2 sandboxes allow me to make all the file system r/o except the project folder which is r/w (and a few other config folders). This means my code runs inside the sandbox like outside, same paths hold, same file system. The .git folder is also r/o inside sandbox, only outside agent can commit. Sandboxing was intended to enable --yolo mode, I wanted to maximize autonomous time.
Work is divided into individual tasks. I could have used Plan Mode or TodoWriter tool to implement tasks - all agents have them nowadays. But instead I chose to plan in task.md files because they can be edited iteratively, start as a user request, develop into a plan with checkbox-able steps, the plan is reviewed by judge agent (in yolo mode, and fresh context), then worker agent solves gates. The gates enforce a workflow of testing soon, testing extensively. There is another implementation judge again in yolo mode. And at the end we update the memory/bootstrap document.
Task files go into the git repo. I also log all user messages and implement intent validation with the judge agents. The judges validate intent along the chain "chat -> task -> plan -> code -> tests". Nothing is lost, the project remembers and understands its history. In fact I like to run retrospective tasks where a task.md 'eats' previous tasks and produces a general project perspective not visible locally.
In my system everything is a md file, logged and versioned on git. You have no issue extracting your memories, in fact I made reflection on past work a primitive operation of this harness. I am using it for coding primarily, but it is just as good for deep research, literature reviews, organizing subject matter and tutoring me on topics, investment planning and orchestrating agent experiment loops like autoresearch. That is because the task.md is just a generic programming pipeline, gates are instructions in natural language, you can use it for any cognitive work. Longest task.md I ran was 700 steps, took hours to complete, but worked reliably.
Scion looks interesting, as a “hypervisor for agents”. It has Kubernetes influences, and a substrate for agent execution is a useful primitive.
Gastown goes further than Scion in that it chains agents together into an ecosystem. My sense is that Gastown or similar could be built as a layer on top of Scion.
Dan Shapiro helped shape my thinking on the two most important capabilities for agent orchestration as concurrency and loops. Scion provides concurrency only at present, and Gastown is also more concurrency-oriented than loops.
Fabro is a new OSS project I am working on which attempts to do both loops and concurrency well: https://github.com/fabro-sh/fabro (Maybe someday it should be built on top of Scion.)
The article is a bit of a strawman, and a bit of an advertisement for a security consultancy. If you ask someone else to pick a password for you, then it's a secret known by two people. So don't do that. That was true a thousand* years ago. It's still true today.
*I know, I know, hash functions didn't exist on Earth a thousand years ago. Still true.
I urge you to actually read the article, because it doesn't say anything about the risks of the LLM knowing your password (e.g., stored in server-side logs), it talks about LLMs generating predicatable passwords because they are deterministic pattern-following machines.
While the loss of secrecy between you and the LLM provider is a legitimate risk, the point of the article was that you should only use vetted RNGs to generate passwords, because LLMs will frequently generate identical secure-looking passwords when asked to do so repeatedly, meaning that all a bad actor has to do is collect the most frequent ones and go hunting.
The loss of secrecy between you and the LLM only poses a risk if the LLM logs are compromised, exposing your generated passwords. The harvesting of commonly-generated passwords from LLMs poses a much broader attack surface for anyone who uses this method, because any attacker with access to publicly available LLMs can start mining commonly generated passwords and using them today without having to compromise anything first.
You're right; I could have phrased the issue better, though I certainly did read the article. Let me try again: letting someone else pick a password for you requires you to trust that they did it well, and you get no benefit in exchange for that trust. That's true for other humans, websites, and now LLMs.
You could zoom out a bit and rephrase the question.
Your great-aunt Ida died and left you a consulting team of ten pretty good software engineers. The team's contracts all just ended, so starting tomorrow they'll be idle. Ida said you must run the business for at least two years (fortunately, overhead is already paid for), or forfeit your share of the inheritance. After that you can keep going or liquidate it.
There’s been some success training models on top of differential privacy.
I imagine that with live requests it would be quite challenging but not impossible, assuming you could somehow sanitize all sorts of private data that people throw at these prompts.
This is indeed interesting because rotating 2D screen is not necessarily the same type of brain processing as experiencing things fly around you. Even VR is not necessarily the same, because knowing you're safe may be different from taking the situation seriously. Could be same, could be completely different.
But the first massively popular 3D games started end of 90s which means Alzheimer cases for them will pop up only around 2060 or later (average onset year 75 minus being 15 years kid during 90s).
Besides safety, there is also the cognitive complexity angle.
Plus, digital environments are explicitly designed to be engaging: authors are putting intentional thought into making the virtual space easy to navigate so that the player doesn't get frustrated and go do something else.
Meanwhile, the physical world is something we're pretty much stuck in, and material spaces tend to be optimized not so much to be engaging to navigate and explore - more to be comfortable to inhabit, etc.
Besides, physical spaces - e.g. cities - tend to be iteratively developed over generations, bearing the hallmarks of many different thinking minds, and not optimized for any one particular user flow.
> ...if you explicitly state that you want to take part in a demonstration against the elected government.
Cambridge Dictionary's definition of a free country: a country where the government does not control what people say or do for political reasons and where people can express their opinions without punishment.
Nowhere in the definition of a free country does it state that you have to be a citizen.
Even in the US constitution that is not the case. Unalienable rights extend to everyone under the constitutions jurisdiction, which includes people who are not citizens. Even aliens get due process in the US. Or should, anyway, if we didn't have anti-American leadership.
When defining a Constitution for a country, to whom would you direct the constitutional precepts? Surely it would only be for people that were to be governed by the constitutional government. China, for example, would not cover American citizens in their Constitution.
reply