Hacker Newsnew | past | comments | ask | show | jobs | submit | ramoz's commentslogin

I do think Cloudflare probably institutes a similar manual review process as well. I have a handful of fairly vocal and supportive engineers I stay in contact with around https://plannotator.ai (there is an integrated code review surface that creates a feedback loop with your local agent).

> agents do a good job of looping over PR comments

This is the easy part. Most harnesses enable some sort of integration now, so you can actually create a smooth local experience around this as well - better code before it ships to more costly review or bloats PR threads.

> guided, educational code review tool

This is a bit tougher, and I find the main harness chat tends to work best. I learn better when I'm more engaged and aware of what I'm asking. It's easy to stick a code tour type of thing on a screen. It's hard to really nail the right attention and learning mechanism around it.


The vast enterprise industry (non-technical) is now aware of Claude/Anthropic.

good feedback, thanks!

> If I had to roll out such a development process today, I’d make a standardized Markdown specification the new unit of knowledge for the software project. Product owners and engineers could initially collaborate on this spec and on test cases to enforce business rules. Those should be checked into the project repositories along with the implementing code. There would need to be automated pull-request checks verifying not only that tests pass but that code conforms to the spec. This specification, and not the code that materializes it, is what the team would need to understand, review, and be held accountable for.

The constant urge I have today is for some sort of spec or simpler facts to be continuously verified at any point in the development process; Something agents would need to be aware of. I agree with the blog and think it's going to become a team sport to manage these requirements. I'm going to try this out by evolving my open source tool [1] (used to review specs and code) into a bit more of a collaborative & integrated plane for product specs/facts - https://plannotator.ai/workspaces/

[1] https://github.com/backnotprop/plannotator


What we really we need is some kind of more detailed spec language that doesn't have edge cases, where we describe exactly what we expect the generated code to do, and then formally verify that the now generated code matches the input spec requirement. It'd be super helpful to have something more formal with no ambiguity, especially because the english language tends to be pretty ambiguous in general which can result in spec problems

I also tend to find especially that there's a lot of cruft in human written spec languages - which makes them overly verbose once you really get into the details of how all of this works, so you could chop a lot of that out with a good spec language

I nominate that we call this completely novel, evolving discipline: 'programming'


There are languages like Dafny that permit you to declare pre- and post-conditions for functions. Dafny in particular tries to automatically verify or disprove these claims with an SMT solver. It would be neat if LLMs could read a human-written contract and iterate on the implementation until it's provably correct. I imagine you'd have much higher confidence in the results using this technique, but I doubt that available models are trained appropriately for this use case.

Ask it to do so, show it how, and it will do it.

> where we describe exactly what we expect the generated code to do, and then formally verify that the now generated code matches the input spec requirement.

In ancient times we had tech to do exactly that: Programming languages and tests.


> What we really we need is some kind of more detailed spec language that doesn't have edge cases, where we describe exactly what we expect the generated code to do, and then formally verify that the now generated code matches the input spec requirement.

That's theorem provers and they're awful for anything of any reasonable complexity.


That's any programming language, really [1]. Any website contains millions of "proofs", not all of them are useful. Choosing what needs to be proven is hard. And the spectrum of languages/type systems and their usability as either is more explored nowadays than it used to be. If you don't likue coq, you can look for agda. If agda is too far for you, you can look for Haskell. If that's still impractical, there's rust or f#, etc... The tradeoff you have between "convenient for expressing proofs" and "convenient for programming" has many options.

[1]: https://www.youtube.com/watch?v=IOiZatlZtGU


Check out Allium, posted here recently.

https://juxt.github.io/allium/


yes, am familiar with the "code is spec" trope.

Shame us all for moving away from something so perfect, precise, and that "doesn't have edge cases."

Hey - if you invent a programming language that can be used in such a way and create guaranteed deterministic behavior based on expressed desires as simple as natural language - ill pay a $200/m subscription for it.


As people are discovering, natural language is insufficiently precise to be able to specify edge cases. Any language precise enough to be formally verified against is a programming language

One agent generates : Spec -> Code then

Another agent: Code -> Inverted Spec

then compare Spec and Inverted Spec.

If there is a Gap, a Human fixes and clarifies the Gap.

This is like Generator and Discriminator aspects of GAN models or Autoencoder models.


we're going to end up speaking past each other - but generally I do agree with you and am not denouncing the importance of formal verification methods. I do think abstractions are going to dominate the human ux above them

XML will do it very well.

Hate it all you want, but XML is genuinely a good fit there, and Claude is apparently insanely good at working with XML prompts.

Is Claude good at working with XML prompts, or is XML good at convincing users to write more Claude-able specs? I am intensely skeptical that you could write an XML document describing a nontrivial web application in full detail, but I could easily imagine someone who thinks they have to stripping out important details because they don't really map to XML.

They train it with XML even the system prompts that Claude reads are formatted by it.

I haven't done it professionally, but my understanding is that this kind of work is much more in the second category, where you have to understand the closest approximation to what you want that the LLM can reliably produce or the training won't work at all.

I don’t know why, but I get this feeling whenever someone uses “insanely” or “shockingly” along with AI, I think they’re bot or are writing based on a guideline! No offense, btw, I’m not saying you’re a bot.

I'm prepared to excise the word "genuinely" from my vocabulary after working with Claude.

One of my biggest fears with using AI at work is that I will subconsciously start talking and writing like a bot, despite making conscious efforts to do the opposite. Just like how when you read a lot of books by one author, their style infects your own writing style.


You’re absolutely right!

Kidding, nah no worries. I do worry people become overly paranoid of bots as time passes.



like declarative vs imperative?

We've been through that so many times. When UML arrived (and ALM tools suites, IBM was trying to sell it, Borland was trying to sell it, all those fancy and expensive StarTeam, Caliber and Together soft), then BPML and its friends arrived, Business Rule Management System (BRMS), Drools in Java world, etc.

It all failed. For a simple reason, popularized by Joel Spolsky: if you want to create specification that describes precisely what software is doing and how it is doing its job, then, well, you need to write that damn program using MS Word or Markdown, which is neither practical nor easy.

The new buzzword is "spec driven development", maybe it will work this time, but I would not bet on that right now.

BTW: when we will be at this point, it does not make sense anymore to generate code in programming languages we have today, LLM can simply generate binaries or at least some AST that will be directly translated to binary. In this way LISP would, eventually, take over the world!.


I called it gates on mine. I loved Beads but it closed tasks without any validation steps. Beads also had other weird issues, so I made my own alternative. I think "Gates" is also used by others projects that took on the same challenge I did in mine weirdly enough.

https://github.com/Giancarlos/guardrails


I’ve been considering this as well, and trying to get my colleagues to understand and start doing it. I use it to pretty decent effect in my vibe coded slop side projects.

In the new world of mostly-AI code that is mostly not going to be properly reviewed or understood by humans, having a more and more robust manifestation and enforcement, and regeneration of the specs via the coding harness configuration combined with good old fashioned deterministic checks is one potential answer.

Taken to an extreme, the code doesn’t matter, it’s just another artifact generated by the specs, made manifest through the coding harness configuration and CI. If cost didn’t matter, you could re-generate code from scratch every time the specs/config change, and treat the specs/config as the new thing that you need to understand and maintain.

“Clean room code generation-compiler-thing.”


> If cost didn’t matter, you could re-generate code from scratch every time the specs/config change, and treat the specs/config as the new thing that you need to understand and maintain.

The critical insight is that this is not true. When people depend on your software, replacing it with an entirely different program satisfying all of your specs and configurations is a large, months-long project requiring substantial effort and coordination even after new program is written. It seems to work in vibe coded side projects because you don't have those dependencies; if you got an angry email from a CEO saying that moving a critical button ruined their monthly review cycle, and demanding 7 days notice before you move any buttons going forwards, you'd just tell them no.


I was about to comment: "HTML creates too much friction after doing all sorts of visual explainers" ... thanks for articulating it well.

As a layer of abstraction, it also creates more requirements: need a browser, likely need includes/cdn libs to avoid bloat, all sorts of other things. Markdown is consumable, diffable, shareable in raw form - and you can add enrichment layers on top without much effort.


To me, the "enrichment" layer means 2 things:

- a tiny DSL for rendering anything custom, where every markdown renderer potentially introduces its own unique bit of syntax that's not transferable (example: frontmatter in Obsidian where you can put tags, that's not vanilla markdown)

- a note taking / viewing app, of which we now have dozens, where moving notes from one app to another creates friction, because of the custom "enrichment" layer each of those apps have (example: any popular plugin in Obsidian, where your notes are now littered with that plugin's tags)

HTML has this type of "enrichment" built-in.

Anyway, I am not trying to convince anyone. This is me working through this in my head. I have a large vault of Obsidian notes that I want to make more useful. And I figure, HTML is the standard-issue tool for producing beautiful-looking and functional text documents, so it's worth thinking about.


A non-technical friend of mine has just won some hospital contracts after vibecoding w/ Claude an inventory management solution for them. They gave him access to IT dept servers and he called me extremely lost on how to deploy (cant connect Claude to them) and also frustrated because the app has some sort of interesting data/state issues.


What concerns me about this is that as these stories multiply and circulate people will just completely stop buying software/SAAS from startups, because 90% or more will be this same thing. It will completely kill the market.


Oracle have routinely had multimillion pound contract failures and people keep buying from them. Big vendors are too big to fail.


Those are custom software or heavily customized implementations of ERP and similar systems for very large organizations. I’m talking more about the SMB market where today it’s possible for a small team to carve out a niche and make a nice living or even bootstrap a venture that competes with a large player that has poor UX or antiquated feature designs.

The reason Oracle can continue failing at those massive projects is simple: everyone fails at them routinely and often it’s the customers fault.


>The reason Oracle can continue failing at those massive projects is simple: everyone fails at them routinely and often it’s the customers fault.

It's even simpler. Youre not paying oracle for some delapidated HR system. You're paying for the legion of accountability that is their on-site engineers to fix stuff for you when things screw up. You're essentially subscribing to a team of engineers you don't need to directly pay salary and benefits to.

People who think you can out efficiency that kind of accountability don't understand how large orgs think.


I used to gripe about various ERP companies but after having dealt with enough, yeah, that's just what the world of ERP systems is like. You will spend your time even with the best of them desiring to scream endlessly at everyone who works there. And they also know your pain but are powerless to help.


there are no 2 identical deployed ERP systems.

It's just an umbrella term for "weak process glue code".


Same with Deloitte


no one's getting fired for hiring either one.


> It will completely kill the market.

it will kill all the people in that hospital too


What is this, Humanitarian News?


The real Hackers were the ones actually trying to minimize suffering all along. Not reproduce it at scale.


But the Torment Nexus is such an interesting technical challenge! and I don’t personally torment people: I just move protobufs around! - Software Engineer #1 and #2 excuses


thankyou


Yeah but only one of those actually puts those responsible in prison https://en.wikipedia.org/wiki/Elizabeth_Holmes

> On January 3, 2022, the jury found Holmes guilty on four of the seven counts related to defrauding investors: three counts of wire fraud, and one of conspiracy to commit wire fraud. She was found not guilty on four counts related to defrauding patients


Those patients weren't hurt. Totally different from the post you're replying to.


I mean, the stories about how stuff was getting built in the late 90s/early 2000s aren’t much worse.


Or you end up with a certification process, which will of course introduce it's own problems but startups doing things the right way and not just "moveing fast and breaking things" can thrive.


As a SWE that has only ever worked for an employer or on his own projects, this makes me wonder: how would someone even get such a contract? Did this person already have a consulting business? Do you just call up random hospitals and ask if you can demo an inventory management system for them? Did this person already know people at the hospital? I know technical folks that do independent consulting, but even with a vibecoded product, how is it that anyone can just get such a contract?


Frictional money.

People really have a misconception about the sums of money that companies operate on on a regular basis. If you are a people person and know essentially how to sell yourself, you can "scrape" money on the fact that nobody is going to look or think too hard about some contract that represents a tiny fraction of the years budget.


That still leaves the question of how one gets their foot in the door. Lots of us are aware of the budgets but we don't get how's sales work at that level.


The only way something like this would work is through "networking", and trust that you are capable of delivering.


I'm practical terms, go to where the decision makers are and shmooz with them. It's a numbers game. Eventually someone will say yes.


That's what it means to be a "people person" in the context of trying to sell a product, yes. Getting within 2 degrees of a decision maker can open up millions for you, while being a rounding error for every company you work with.


He's already in the whole consulting sphere around these hospitals in his area.


This hospital will learn some hard lessons. I hope their backup strategy is good. I'm surprised they can field software from an entity that isn't SOC2 & HIPAA certified.


No worries! At worst, the contractor can just tell Claude to make sure the hospital knows they're appropriately certified. And the hospital can use Claude to make sure the certs are valid. Everybody wins, except the ones who end up dead. Or with their health destroyed.


> from an entity that isn't SOC2 & HIPAA certified

What do you think the fake Delve attestation scandal was about? https://news.ycombinator.com/item?id=47444319


As a cybersecurity IR professional as much as I hate to see this happen to a hospital this kind of thing is responsible for essentially tripling my income over the last 3 years.


Have you tried to talk him out of it, and have you considered blowing the whistle on him? He could kill people!


Wow. This is like every other gold rush. Millions will walk into the ice and snow, somehow not questioning that their ability to dig is not unique.


Well, selling shovels has always been a good way to deal with that problem


The shovel sellers are ringing the cash register.


This is going to happen all over. Company I'm currently contracting with has gone AI everything (aka technical debt hell), and they're gonna suffer for it. I'm glad my consulting contract ends in 2 months. I don't want to be around for the crash


Don't help him. Let him figure it out by himself, else they (he and hospital) will never learn.


A hospital could not learn a bigger lesson from this person than their existing big players.

(Screams in "deployed in 2026 a new product that only works in internet explorer" in healthcare).


I work at a university and we still have some workstations that need IE as well, for a healthcare vendor app that needs ActiveX. Up until recently we even had some machines running Windows 7.


I don't have time for that. I just told him he needs to hire somebody


I was going to say to open yourself up as a contractor and scape some of the money off top. But it sounds like you dint need that opportunity.

That sadly does seem to be the trajectory of 5-10 years from now, though. I can't speak to if "AI is the future" of 30+ years from now, but these coming years sounds rife for "janitors" to clean up all the slop being produced by newly empowered idea guys


Or, "help" by asking questions, or otherwise by sharing an AI review/analysis/suggestions, since they're into that kind of thing.

Definitely cleaning up other people's AI mess for them for free is not a good use of time.


I'd really like to know how he won contracts, just in general. Did he have some connections. And he doesn't even know how to get it to run on a server by himself? There's millions of people that can do that, if he can win contracts why worry about vibe coding at all, just hire someone to do it. Winning contracts is the challenge in my view.


He's already within the consulting sphere around hospitals in his area.


I hope you have quoted him a very very high hourly rate.


Did he lie about HIPAA compliance?


Heaven help us.


Hospitals? Vibe code?

Dear Lord. Respect to your friend for mad marketing skills, however. Selling slop to mission-critical sectors is next level.


jfc lmao


This is much of a something burger for users who actively use `claude -p` under their subscription. Users will have to do their own math, but that 200 could come and go quite fast and then you're hitting normal API rates versus what you had previously been hitting.

https://x.com/jeremyphoward/status/2054682882753597603?s=20


Sure, but if you spend that many tokens you're probably a pretty big net loss for the company anyways. No company will subsidize that level of compute for you indefinitely.


Yea that's fine and understandable.

But like I said, this is something for those users. I launch many review processes locally with `claude -p`


No. I just use claude -p to review some Codex output.

The Pro plan is barely enough for this after the doubling of usage limits.


Happy to change the title if there is explicit clarity on what my `claude -p` usage costs me today & it not differing from what it'll cost me once the change is in effect.


The wording in the press is very confusing. Not sure if deliberate.

If you're an active `claude -p` user, it will now cost you API rates vs being able to user your subscription.

UPDATE: https://x.com/lydiahallie/status/2054650920768807313?s=20

Still confused.


Where are you seeing that info?? The link you posted says nothing about -p


I wasn't able to post the official links because it was already posted by simple reposters who didn't understand, and titled their HN posts as "you get credits" basically

this is the original thread https://x.com/ClaudeDevs/status/2054610152817619388?s=20


It's literally there.


You also get $200 in monthly credits so it's not bad.


Ya but that monthly usage doesn't refill every five hours or every week the way your subscription usage does. Once you're through the $200, you're done until next month.


You can't seriously be expecting between $800 and $28,800 worth of API credits on a $200/mo subscription.


I mean it does cost you API rates, you just get a certain amount of credit by having a subscription.


I’ve done a lot of exploration here.

Visual explainers certainly come in hand. You need to have both the extra time and cost of tokens accounted for when you ask for these. As well as maintainability - you lose things like vcs/diffs over plans that are much more readable as markdown.

The more interactivity on the surface, the better. Esp if they create feedback loops with the agent. Simple annotations to start.

Markdown still goes a long way. I’ve effectively been enriching plan rendering in Plannotator. Now supporting GFM, custom svgs, etc.

https://github.com/backnotprop/plannotator

For fun, I also enabled annotations on top of these type of svgs, see here https://x.com/plannotator/status/2052851731998941380?s=46


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: