I do think Cloudflare probably institutes a similar manual review process as well. I have a handful of fairly vocal and supportive engineers I stay in contact with around https://plannotator.ai (there is an integrated code review surface that creates a feedback loop with your local agent).
> agents do a good job of looping over PR comments
This is the easy part. Most harnesses enable some sort of integration now, so you can actually create a smooth local experience around this as well - better code before it ships to more costly review or bloats PR threads.
> guided, educational code review tool
This is a bit tougher, and I find the main harness chat tends to work best. I learn better when I'm more engaged and aware of what I'm asking. It's easy to stick a code tour type of thing on a screen. It's hard to really nail the right attention and learning mechanism around it.
> If I had to roll out such a development process today, I’d make a standardized Markdown specification the new unit of knowledge for the software project. Product owners and engineers could initially collaborate on this spec and on test cases to enforce business rules. Those should be checked into the project repositories along with the implementing code. There would need to be automated pull-request checks verifying not only that tests pass but that code conforms to the spec. This specification, and not the code that materializes it, is what the team would need to understand, review, and be held accountable for.
The constant urge I have today is for some sort of spec or simpler facts to be continuously verified at any point in the development process; Something agents would need to be aware of. I agree with the blog and think it's going to become a team sport to manage these requirements. I'm going to try this out by evolving my open source tool [1] (used to review specs and code) into a bit more of a collaborative & integrated plane for product specs/facts - https://plannotator.ai/workspaces/
What we really we need is some kind of more detailed spec language that doesn't have edge cases, where we describe exactly what we expect the generated code to do, and then formally verify that the now generated code matches the input spec requirement. It'd be super helpful to have something more formal with no ambiguity, especially because the english language tends to be pretty ambiguous in general which can result in spec problems
I also tend to find especially that there's a lot of cruft in human written spec languages - which makes them overly verbose once you really get into the details of how all of this works, so you could chop a lot of that out with a good spec language
I nominate that we call this completely novel, evolving discipline: 'programming'
There are languages like Dafny that permit you to declare pre- and post-conditions for functions. Dafny in particular tries to automatically verify or disprove these claims with an SMT solver. It would be neat if LLMs could read a human-written contract and iterate on the implementation until it's provably correct. I imagine you'd have much higher confidence in the results using this technique, but I doubt that available models are trained appropriately for this use case.
> where we describe exactly what we expect the generated code to do, and then formally verify that the now generated code matches the input spec requirement.
In ancient times we had tech to do exactly that: Programming languages and tests.
> What we really we need is some kind of more detailed spec language that doesn't have edge cases, where we describe exactly what we expect the generated code to do, and then formally verify that the now generated code matches the input spec requirement.
That's theorem provers and they're awful for anything of any reasonable complexity.
That's any programming language, really [1]. Any website contains millions of "proofs", not all of them are useful. Choosing what needs to be proven is hard. And the spectrum of languages/type systems and their usability as either is more explored nowadays than it used to be. If you don't likue coq, you can look for agda. If agda is too far for you, you can look for Haskell. If that's still impractical, there's rust or f#, etc... The tradeoff you have between "convenient for expressing proofs" and "convenient for programming" has many options.
Shame us all for moving away from something so perfect, precise, and that "doesn't have edge cases."
Hey - if you invent a programming language that can be used in such a way and create guaranteed deterministic behavior based on expressed desires as simple as natural language - ill pay a $200/m subscription for it.
As people are discovering, natural language is insufficiently precise to be able to specify edge cases. Any language precise enough to be formally verified against is a programming language
we're going to end up speaking past each other - but generally I do agree with you and am not denouncing the importance of formal verification methods. I do think abstractions are going to dominate the human ux above them
Is Claude good at working with XML prompts, or is XML good at convincing users to write more Claude-able specs? I am intensely skeptical that you could write an XML document describing a nontrivial web application in full detail, but I could easily imagine someone who thinks they have to stripping out important details because they don't really map to XML.
I haven't done it professionally, but my understanding is that this kind of work is much more in the second category, where you have to understand the closest approximation to what you want that the LLM can reliably produce or the training won't work at all.
I don’t know why, but I get this feeling whenever someone uses “insanely” or “shockingly” along with AI, I think they’re bot or are writing based on a guideline! No offense, btw, I’m not saying you’re a bot.
I'm prepared to excise the word "genuinely" from my vocabulary after working with Claude.
One of my biggest fears with using AI at work is that I will subconsciously start talking and writing like a bot, despite making conscious efforts to do the opposite. Just like how when you read a lot of books by one author, their style infects your own writing style.
We've been through that so many times. When UML arrived (and ALM tools suites, IBM was trying to sell it, Borland was trying to sell it, all those fancy and expensive StarTeam, Caliber and Together soft), then BPML and its friends arrived, Business Rule Management System (BRMS), Drools in Java world, etc.
It all failed. For a simple reason, popularized by Joel Spolsky: if you want to create specification that describes precisely what software is doing and how it is doing its job, then, well, you need to write that damn program using MS Word or Markdown, which is neither practical nor easy.
The new buzzword is "spec driven development", maybe it will work this time, but I would not bet on that right now.
BTW: when we will be at this point, it does not make sense anymore to generate code in programming languages we have today, LLM can simply generate binaries or at least some AST that will be directly translated to binary. In this way LISP would, eventually, take over the world!.
I called it gates on mine. I loved Beads but it closed tasks without any validation steps. Beads also had other weird issues, so I made my own alternative. I think "Gates" is also used by others projects that took on the same challenge I did in mine weirdly enough.
I’ve been considering this as well, and trying to get my colleagues to understand and start doing it. I use it to pretty decent effect in my vibe coded slop side projects.
In the new world of mostly-AI code that is mostly not going to be properly reviewed or understood by humans, having a more and more robust manifestation and enforcement, and regeneration of the specs via the coding harness configuration combined with good old fashioned deterministic checks is one potential answer.
Taken to an extreme, the code doesn’t matter, it’s just another artifact generated by the specs, made manifest through the coding harness configuration and CI. If cost didn’t matter, you could re-generate code from scratch every time the specs/config change, and treat the specs/config as the new thing that you need to understand and maintain.
> If cost didn’t matter, you could re-generate code from scratch every time the specs/config change, and treat the specs/config as the new thing that you need to understand and maintain.
The critical insight is that this is not true. When people depend on your software, replacing it with an entirely different program satisfying all of your specs and configurations is a large, months-long project requiring substantial effort and coordination even after new program is written. It seems to work in vibe coded side projects because you don't have those dependencies; if you got an angry email from a CEO saying that moving a critical button ruined their monthly review cycle, and demanding 7 days notice before you move any buttons going forwards, you'd just tell them no.
I was about to comment: "HTML creates too much friction after doing all sorts of visual explainers" ... thanks for articulating it well.
As a layer of abstraction, it also creates more requirements: need a browser, likely need includes/cdn libs to avoid bloat, all sorts of other things. Markdown is consumable, diffable, shareable in raw form - and you can add enrichment layers on top without much effort.
- a tiny DSL for rendering anything custom, where every markdown renderer potentially introduces its own unique bit of syntax that's not transferable (example: frontmatter in Obsidian where you can put tags, that's not vanilla markdown)
- a note taking / viewing app, of which we now have dozens, where moving notes from one app to another creates friction, because of the custom "enrichment" layer each of those apps have (example: any popular plugin in Obsidian, where your notes are now littered with that plugin's tags)
HTML has this type of "enrichment" built-in.
Anyway, I am not trying to convince anyone. This is me working through this in my head. I have a large vault of Obsidian notes that I want to make more useful. And I figure, HTML is the standard-issue tool for producing beautiful-looking and functional text documents, so it's worth thinking about.
A non-technical friend of mine has just won some hospital contracts after vibecoding w/ Claude an inventory management solution for them. They gave him access to IT dept servers and he called me extremely lost on how to deploy (cant connect Claude to them) and also frustrated because the app has some sort of interesting data/state issues.
What concerns me about this is that as these stories multiply and circulate people will just completely stop buying software/SAAS from startups, because 90% or more will be this same thing. It will completely kill the market.
Those are custom software or heavily customized implementations of ERP and similar systems for very large organizations. I’m talking more about the SMB market where today it’s possible for a small team to carve out a niche and make a nice living or even bootstrap a venture that competes with a large player that has poor UX or antiquated feature designs.
The reason Oracle can continue failing at those massive projects is simple: everyone fails at them routinely and often it’s the customers fault.
>The reason Oracle can continue failing at those massive projects is simple: everyone fails at them routinely and often it’s the customers fault.
It's even simpler. Youre not paying oracle for some delapidated HR system. You're paying for the legion of accountability that is their on-site engineers to fix stuff for you when things screw up. You're essentially subscribing to a team of engineers you don't need to directly pay salary and benefits to.
People who think you can out efficiency that kind of accountability don't understand how large orgs think.
I used to gripe about various ERP companies but after having dealt with enough, yeah, that's just what the world of ERP systems is like. You will spend your time even with the best of them desiring to scream endlessly at everyone who works there. And they also know your pain but are powerless to help.
But the Torment Nexus is such an interesting technical challenge! and I don’t personally torment people: I just move protobufs around! - Software Engineer #1 and #2 excuses
> On January 3, 2022, the jury found Holmes guilty on four of the seven counts related to defrauding investors: three counts of wire fraud, and one of conspiracy to commit wire fraud. She was found not guilty on four counts related to defrauding patients
Or you end up with a certification process, which will of course introduce it's own problems but startups doing things the right way and not just "moveing fast and breaking things" can thrive.
As a SWE that has only ever worked for an employer or on his own projects, this makes me wonder: how would someone even get such a contract? Did this person already have a consulting business? Do you just call up random hospitals and ask if you can demo an inventory management system for them? Did this person already know people at the hospital? I know technical folks that do independent consulting, but even with a vibecoded product, how is it that anyone can just get such a contract?
People really have a misconception about the sums of money that companies operate on on a regular basis. If you are a people person and know essentially how to sell yourself, you can "scrape" money on the fact that nobody is going to look or think too hard about some contract that represents a tiny fraction of the years budget.
That still leaves the question of how one gets their foot in the door. Lots of us are aware of the budgets but we don't get how's sales work at that level.
That's what it means to be a "people person" in the context of trying to sell a product, yes. Getting within 2 degrees of a decision maker can open up millions for you, while being a rounding error for every company you work with.
This hospital will learn some hard lessons. I hope their backup strategy is good. I'm surprised they can field software from an entity that isn't SOC2 & HIPAA certified.
No worries! At worst, the contractor can just tell Claude to make sure the hospital knows they're appropriately certified. And the hospital can use Claude to make sure the certs are valid. Everybody wins, except the ones who end up dead. Or with their health destroyed.
As a cybersecurity IR professional as much as I hate to see this happen to a hospital this kind of thing is responsible for essentially tripling my income over the last 3 years.
This is going to happen all over. Company I'm currently contracting with has gone AI everything (aka technical debt hell), and they're gonna suffer for it. I'm glad my consulting contract ends in 2 months. I don't want to be around for the crash
I work at a university and we still have some workstations that need IE as well, for a healthcare vendor app that needs ActiveX. Up until recently we even had some machines running Windows 7.
I was going to say to open yourself up as a contractor and scape some of the money off top. But it sounds like you dint need that opportunity.
That sadly does seem to be the trajectory of 5-10 years from now, though. I can't speak to if "AI is the future" of 30+ years from now, but these coming years sounds rife for "janitors" to clean up all the slop being produced by newly empowered idea guys
I'd really like to know how he won contracts, just in general. Did he have some connections. And he doesn't even know how to get it to run on a server by himself? There's millions of people that can do that, if he can win contracts why worry about vibe coding at all, just hire someone to do it. Winning contracts is the challenge in my view.
This is much of a something burger for users who actively use `claude -p` under their subscription. Users will have to do their own math, but that 200 could come and go quite fast and then you're hitting normal API rates versus what you had previously been hitting.
Sure, but if you spend that many tokens you're probably a pretty big net loss for the company anyways. No company will subsidize that level of compute for you indefinitely.
Happy to change the title if there is explicit clarity on what my `claude -p` usage costs me today & it not differing from what it'll cost me once the change is in effect.
I wasn't able to post the official links because it was already posted by simple reposters who didn't understand, and titled their HN posts as "you get credits" basically
Ya but that monthly usage doesn't refill every five hours or every week the way your subscription usage does. Once you're through the $200, you're done until next month.
Visual explainers certainly come in hand. You need to have both the extra time and cost of tokens accounted for when you ask for these. As well as maintainability - you lose things like vcs/diffs over plans that are much more readable as markdown.
The more interactivity on the surface, the better. Esp if they create feedback loops with the agent. Simple annotations to start.
Markdown still goes a long way. I’ve effectively been enriching plan rendering in Plannotator. Now supporting GFM, custom svgs, etc.
> agents do a good job of looping over PR comments
This is the easy part. Most harnesses enable some sort of integration now, so you can actually create a smooth local experience around this as well - better code before it ships to more costly review or bloats PR threads.
> guided, educational code review tool
This is a bit tougher, and I find the main harness chat tends to work best. I learn better when I'm more engaged and aware of what I'm asking. It's easy to stick a code tour type of thing on a screen. It's hard to really nail the right attention and learning mechanism around it.
reply