Hacker Newsnew | past | comments | ask | show | jobs | submit | hathawsh's commentslogin

Your HN account is too new for me to be sure whether you're being sarcastic or not. Perhaps you know, or perhaps you don't, that all code is machine-translated, even assembly language. None of it is perfect, but it's not garbage. Today's AI merely provides a new level. It's a weird, non-deterministic level, but hiring an employee to write code for you is similarly non-deterministic.

Right, and that's why Mel was a true programmer!

Seriously though, that's an overly-pedantic definition of a compiler. Broadly speaking, languages compile in a direction of decreasing abstraction. Crossing from one high-level abstraction to another is just asking for trouble, especially in this case where the target language makes very specific performance promises as long as certain abstractions are maintained.


These are also the markers of human journalists who write daily. Journalism is the reason AI acquired these habits. Gemini says this article is probably not generated by AI, particularly because it has original quotes.

https://gemini.google.com/share/ba48849a15a9


Personally I wouldn't cite Gemini for this because I have no idea if it has any kind of track record of accurately distinguishing human from AI writing.

That said, Pangram agrees and its track record is pretty good.


> particularly because it has original quotes.

I'm not saying the quotes are fake, that would be horrific. I'm saying the rest of the article appears to have had minimal human intervention.


At some point, however distasteful to the naturalists, do we accept that writing with AI is still writing? There will be an arms race the way there was moving from banner ads -> whatever hellscape we have today ...


It's the same as copying and pasting the wikipedia article and calling that your article. We all can generate our own slop if we want. If all you are peddling is slop, you are peddling nothing I can't get myself.


Then why did you point to the em-dash in the quote as evidence of AI authorship?


LLMs did not invent clickbaity headlines. Kinda odd that people think it did


Isn't that interesting? The job of exploring a theory or model to such an extent that it can be expressed in computer code always seems to fall on the shoulders of a software developer. Other people can write specifications and requirements all day long, but until a software developer has tackled the problem, the theory probably hasn't been explored well enough yet to express clearly in computer code. It feels like software developers are scientists who study their customers' knowledge domains.


> It feels like software developers are scientists who study their customers' knowledge domains.

I agree so much with this. It's why I feel so stifled when an e.g. product manager tries to insulate and isolate me from the people who I'm trying to serve -- you (or a collective of yous) need to have access to both expertise in the domain you're serving, and expertise in the method of service, in order to develop an appropriate and satisfactory solution. Unnecessary games of telephone make it much harder for anyone to build an internal theory of the domain, which is absolutely essential for applying your engineering skills appropriately.


> so stifled when an e.g. product manager

Another facet of this is my annoyance at other developers when they persistently incurious about the domain. (Thankfully, this has not been too common.)

I don't just mean when there are tight deadlines, or there's a customer-from-heck who insists they always know best, but as their default mode of operation. I imagine it's like a gardener who cares only about the catalogue of tools, and just wants the bare-minimum knowledge to deal with any particular set of green thingies in the dirt.


This is why at my current place we are not supposed to do any dev without an SME on the call. We do the development and share the screen and get immediate feedback as we are working in real time! It's great.


This might be an indicator that PM isn't doing their job; PM should be able to answer you questions regarding what the business wants (= people who you're trying to serve). Developers, by the nature of interacting with domain, do become experts in the domain, but really it should be up to PM what the domain should be doing business-wise.


If that is what a PM needs then there aren't enough good PM to warrant a PM role for most products, so just make software engineers do that in most cases.

Edit: The main role of PM is to decide which features to build, not how those features should be built or how they should work. Someone has to decide what to build, that is the PM, but most PM are not very good at figuring out the best way for those features to work so its better if the programmers can talk to users directly there. Of course a PM could do that work if they are skilled at it, but most PM wont be.


> not [...] how they should work

So that we're on the same page, what I think should be PM responsibilities:

If I have a user story: "As a customer I want to purchase a product so that I can receive it at my address" - PM defines this user story as they have insight to decide if such feature is needed.

PM should then define acceptance criteria: "Given customer is logged in When they view Product page Then 'Add product to basket' button should appear", "Given 'Add product to basket' button When customers click on it Then Product information modal should appear" etc - PM should know what users actually want, ie whether modals should appears, or not; whether this feature should be available for logged users only, or not.

How this will work shouldn't matter to PM; these are AC they've defined.

Of course the process of defining AC should involve developers (and QA), because AC should be exhaustive to delivering given feature


The problem, in my experience, is that most PMs don't add anything when it comes to drawing up the acceptance criteria.

In your example of an order placement - the PM has no special knowledge of what is a good customer order flow. Developers are usually way better at coming up with those by the dint of experience and technical knowledge of the current codebase and make the appropriate speed/polish trade-off.

PMs acts as an imperfect proxy for what the customer wants, making judgements off nothing more than their own taste. And though there are many great PMs, the taste of a PM is usually worse than that of developers and designers on average.

IMO the main business reason they exist is for organization accountability and ownership, despite the often negative value they bring.


Agree 100%.

Even the most verbose specifications too often have glaring ambiguities that are only found during implementation (or worse, interoperability testing!)


In theory, it's the same as in practice.

In practice, it isn't.


Sorry this is just the interior trapped nonsense that engineers find themselves in. Please touch grass

Product designers have to intuit the entire world model of the customer. Product managers have to intuit the business model that bridges both. And on and on.

Why do engineers constantly have these laughably mind blowing moments where they think they are the center of the universe.


I agree so much with the both of you, to the point it's difficult to avoid cognitive dissonance one way or the other.

Software people do what they do better than anyone else. I mean obviously! Just listening to a non-software person discuss software is embarrassing. As it should be.

There's something close to mathematics that SWEs do, and yet it's so much more useful and economically relevant than mathematics, and I believe that's the bulk of how the "center of the universe" mindset develops. But they don't care that they're outclassed by mathematicians in matters of abstract reasoning, because they're doers and builders, and they don't care that they're outclassed by people in effective but less intellectual careers, because they're decoding the fundamental invariants of the universe.

I don't know. I guess I care so much because I can feel myself infected by the same arrogance when I finally succeed in getting my silicon golems to carry out my whims. It's exhilarating.


We keep seeing things like cryptic error messages shown to end users simply because of the disconnect between the programmer and the end user.

If the programmer gets to intimately understand the user's experience software would be easier to use. That's why I support the idea of engineers taking support calls on rotation to understand the user.

Both can be true at the same time, a product manager who retains the big picture of the business and product, and engineers who understand tiny but important details of how the product is being used.

If there were indeed perfect product managers, there would no need for product support.


>We keep seeing things like cryptic error messages shown to end users simply because of the disconnect between the programmer and the end user.

A lot of the error messages I'd write were for me, especially those errors I never expected to see.

The typical feedback I'd get from end users is "your software doesn't work". If they can send me a screenshot of the error I'm halfway to solving the problem.


I actually agree with this. Product designers and product managers are often essential and sometimes they do up to 99% of the work of figuring out how something should work. To accomplish that, they often do things well outside the role of a software developer. On the other hand, in my experience, only someone with a software development mindset seems to be able to complete the last 1% (or 10%, or whatever) that reveals and resolves certain kinds of logic issues.


You seem to be assuming a certain org structure with very clear, specialized roles. Many teams do not have this, and engineers are already Product Engineers. It sometimes even makes sense (whenever engineers dogfood their product, startups, or if it is a product targeting other engineers) and is not just a budget/capacity issue.

Similarly, by siloing the world model in one or two heads, you disable the team dynamics from contributing to building a better solution: eg. a product manager/designer might think the right solution is an "offline mode" for a privacy need without communicating the need, the engineering might decide to build it with an eventual consistency model — sync-when-reconnected — as that might be easier in the incumbent architecture, and the whole privacy angle goes out the window. As with everything, assuming non-perfection from anyone leads to better outcomes.

Finally, many of the software engineers are the creative type who like solving customer problems in innovative ways, and taking it away in a very specialized org actually demotivates them. Many have worked in environments where this was not just accepted, but appreciated, and I've it seen it lead to better products built _faster_.


Is that actually true, though? Even though it's not really my job, I find myself debugging certificates and keys at least once a month, and that's after automating as much as possible with certbot and cloud certificates. PKI always seems to demand attention.


In my initial comment, I meant more in terms of complexity and planning from the perspective of the people who are running the public/private key infrastructure on the other side/upstream of what you're doing as a letsencrypt end user.

Broadly similar general concept to the team responsible for the DNSSSEC signing keys for an entire ccTLD.

Yeah a x509 PKI / root CA is a very different thing than DNSSSEC but they have a number of general logical similarities in that the chain of trust ultimately comes down to a "do not fuck this up" single point of failure.


This is an amazing resource. It was difficult to appreciate what this resource was for until I tried to create my own boards based on an ESP32. It's not really difficult to build around ESP32, it's just that I don't know what I don't know. With starting points like these, I can start with a lot more confidence. Thank you!


Does this help you build a custom PCB that you would send to a factory or like just design and simulate something you could build on your own? Or both / neither? I'm not fully understanding what this project does, could you offer insight?


This is File -> New Project... -> New Hello World Project. The New Project button in hardware engineering tools often don't have the trailing 3 dots.

I think most low-end projects done in KiCad are not tested beyond making sure there's no red squiggly underlines at a glance. You are your own F5 key and assembler/runtime crash reporter. Proper circuit verification through software simulation isn't needed for most digital designs unless you do your own wireless antenna, analog amps, and/or DRAM/PCIe/GbE/etc.


Your analogy is more spot on that you may know. The syntax is just a bit off ;)

"File > New Project from Template"

KiCAD comes with all the usual suspects, including Arduino and the various hats. You can get pmod templates, etc. They're actually really nice.

I use the pmod template all the time because it saves time and they're convenient to plug into Arty dev boards. PCBs are so cheap and quick I'll often make a quick PCB with a template because I just want a cleaner connector system. PCBs are basically bread boards these days.

https://techexplorations.com/guides/kicad/3e/create-a-new-ki...

https://gitlab.com/kicad/libraries/kicad-templates


I like the "File -> New Project" analogy.

I guess in theory, the original question is whether this project allows a board to be sent of for construction at a company that makes and populates boards. Yes, you could do this if you wanted to. As numpad0 has said though, it's early days for these boards and if you wanted to do something commercially reliable, you will most likely run into issues with things not being completely tested on these boards yet.

These boards provide the ability to make your own boards to host the chipsets yourself, rather than relying on a third party providing the board. So what? What if you want USB-C? What if you want to make a square or a circular board? This project is a good step along the way to allowing you to make these kinds of things.

On the hobbyist and corporate side, they also provide a way to provide a modern design that can use USB-C, which is becoming very common and is better than older USB options.

As mentioned in the README.md "Available Development Boards" section, the Atmega16u2 chip was hard to come by for Hanqaqa in 2023. The Arduino guys (arduino.com ?) probably did a "lifetime buy" of these comms chips and they probably also have several shelves of fully built Arduino boards as well. Lifetime buys and keeping good stock levels mitigate the risk of difficulty building new boards... Just get one of the older working ones off the shelf and send it. However, for an organisation (even an open source board that becomes fairly popular) wanting to build their own board, not having a given comms chip is a problem. Replacing it with a commonly available one makes it much easier for people/companies wanting to build these boards in any kind of numbers.

Having the board design readily available is really useful for the reasons above. It does seem like overkill if you just want to fiddle with a board, but if you make something that becomes popular that needs any kind of hardware adjustment, having the design becomes almost essential.



Copy that!

Wonderful that there's a Free version of these designs out there. The bugs and kinks will get sorted out over time.


I'm either in a minority or a silent majority. Claude Code surpasses all my expectations. When it makes a mistake like over-editing, I explain the mistake, it fixes it, and I ask it to record what it learned in the relevant project-specific skills. It rarely makes that mistake again. When the skill file gets big, I ask Claude to clean and compact it. It does a great job.

It doesn't really make sense economically for me to write software for work anymore. I'm a teacher, architect, and infrastructure maintainer now. I hand over most development to my experienced team of Claude sessions. I review everything, but so does Claude (because Claude writes thorough tests also.) It has no problem handling a large project these days.

I don't mean for this post to be an ad for Claude. (Who knows what Anthropic will do to Claude tomorrow?) I intend for this post to be a question: what am I doing that makes Claude profoundly effective?

Also, I'm never running out of tokens anymore. I really only use the Opus model and I find it very efficient with tokens. Just last week I landed over 150 non-trivial commits, all with Claude's help, and used only 1/3 of the tokens allotted for the week. The most commits I could do before Claude was 25-30 per week.

(Gosh, it's hard to write that without coming across as an ad for Anthropic. Sorry.)


> I'm either in a minority or a silent majority. Claude Code surpasses all my expectations.

I looked at some stats yesterday and was surprised to learn Cursor AI now writes 97% of my code at work. Mostly through cloud agents (watching it work is too distracting for me)

My approach is very simple: Just Talk To It

People way overthink this stuff. It works pretty good. Sharing .md files and hyperfocusing on various orchestrations and prompt hacks of the week feels as interesting as going deep on vim shortcuts and IDE skins.

Just ask for what you want, be clear, give good feedback. That’s it


Right - I have a ton of coworkers who obsess over "skills" and different ways to run agents and whatnot but I just... spend some time to give very thorough, detailed instructions and it just Does The Thing. I rarely fight with Claude Code these days.


We probably need something like the WET principle for skills. If you need to explain the same thing to an agent more than twice, turn it into a skill (or add it to AGENTS.md, or CLAUDE.md, or to you docs folder, or your guides folder, or whatever method you use). If you haven't needed to explain it more than twice, it's probably fine. The context pollution from the skill would likely be worse than not having the skill

Of course exceptions apply. Some basic information that will reliably be discovered is still worth adding to your AGENTS.md to cut down on token use. But after a couple obvious things you quickly get into the realm of premature optimization (unless you actually measure the effects)


Same here. For me, this means a spec doc split into features/UX, technical requirements, and language-specific requirements, iterated before the model touches code.


The trick is to "just use it", BUT every few weeks grab the logs (you do keep them, right?) and have a session with the model to find out if there are any repeated patterns.

If you find any, consider making them into skills or /commands or maybe even add them to AGENTS.md.


Which logs do you use for that?


I would assume those in ~/.claude/projects/**/*.jsonl. They contain full conversation history, including the tool calls that were made, how man tokens were consumed, etc


Claude has a built-in /insights feature for this, but you can replicate it with any other tool that keeps the session logs on disk.


I agree it works nicely for me. From my experience it’s not realistic to expect one-shot each time. But asking it to build chunks and entering a review cycle with nudging works well. Once I changed my mindset from it « didn’t do a one-shot so it’s crap » and took it as an iterative tool that build pieces that I assemble it’s been working nicely without external frameworks or anything. Plan-review, iterate, split, build, review iterate


You're wasting a ton of tokens doing that though. Right now you don't realize it because they're being heavily subsidized, but you will understand the point of have good orchestration and memory files when you will have to pay the real cost of your use.


Cost cannot go up, only down with time (with occasional short term fluctuations). Competition, including open weight models and consumer hardware (ie upcoming M5 Ultra) keeps moving ceiling of what you can charge down.


If the cost is subsidized by another cash source (e.g. VC money) when the source stops prices can definitely go up.


Company pays for company’s tokens, so company’s problem, not mine. I am happy to skill up and avoid overusing tokens for my personal sub, but if it’s getting results then I couldn’t care less how much my employer has to pay for it. They’re begging me to use it in the first place anyway.


> You're wasting a ton of tokens doing that though.

My time is worth more than tokens. I’m thinking of maybe creating some .md files to save me time in code review. If I do it right, it’s going to cost more in tokens because the robots will do more.


My experience as well on non trivial stuff for personal projects, just talk... It makes mistakes but considering the code I see in professionnal settings, I rather deal with an agent than third parties.


I love the IDE skins analogy. Very true.


Everyone knows that a red UI skin goes faster


How do you collect these stats?

Is it by characters human typed vs AI generated, or by commit or something?


> How do you collect these stats?

Cursor dashboard. I know they're incentivized to over-estimate but feels directionally accurate when I look at recent PRs.


Are you mostly using the Composer model?


> Are you mostly using the Composer model?

Don’t really think about it. I think when I talk to it through Slack, cursor users codex, in my ide looks like it’s whatever highest claude. In Github comments, who even knows


It's interesting how variable people's experiences seem to be.

Personally, I tend to get crap quality code out of Claude. Very branchy. Very un-DRY. Consistently fails to understand the conventions of my codebase (e.g. keeps hallucinating that my arena allocator zero initializes memory - it does not). And sometimes after a context compaction it goes haywire and starts creating new regressions everywhere. And while you can prompt to fix these things, it can take an entire afternoon of whack-a-mole prompting to fix the fallout of one bad initial run. I've also tried dumping lessons into a project specific skill file, which sometimes helps, but also sometimes hurts - the skill file can turn into a footgun if it gets out of sync with an evolving codebase.

In terms of limits, I usually find myself hitting the rate limit after two or three requests. On bad days, only one. This has made Claude borderline unusable over the past couple weeks, so I've started hand coding again and using Claude as a code search and debugging tool rather than a code generator.


> In terms of limits, I usually find myself hitting the rate limit after two or three requests.

I'd absolutely love to see exactly what you're doing (...well, maybe in a world where I had unlimited time or could clone myself...) because as tight as the usage limits are I absolutely cannot fathom hitting them THAT early.

What are the requests like, and have you noticed what is Claude doing during them? Is it reading an entire massive codebase or files that are thousands of lines long? Or are you loaded up with many MCPs or have an ever-growing CLAUDE.md?


I'm writing a compiler. When I have Claude write a new feature, I have validate that suite against a test suite of ~200 tiny programs.

I have a shell script that automates this. If all tests pass, the shell script prints "200/200 passing" with very little token spend. If only 190/200 pass, the shell script reports the names of every test that failed, and now Claude does a process of

1) run the compiler binary -> 2) get assembly output and inspect for obvious errors -> 3) assemble -> 4) verify that the assembler did not report errors -> 5) run test binary, connect with gdb, and find the issue -> 6) edit the compiler source -> 7) recompile the compiler -> 8) back to 1

multiplied by 10 for the 10 failing tests. This eats up tokens very quickly. I realize that not every use case is going to look like this. But if I didn't have Claude verify against the test suite, then I'd be getting regressions left and right, and then what's the point?

The whole codebase (tests included) is less than 15k lines, so I don't think that's the issue. No MCPs. CLAUDE.md about 1.5k lines.


> Very branchy. Very un-DRY.

I've found this can be vastly reduced with AGENTS.md instructions, at least with codex/gpt-5.4.


What sorts of instructions?


Usually I just put something like "Prefer DRY code". I like to keep my AGENTS.md DRY too :)


also add "no hallucinations" and "make it works this time pretty please" while also say Claude will go to jail if does not do it right should work all the time (so like 60%)


There are of course limits to what prompting can do, but it does steer the models.

In TFA they found that prompting mitigates over-editing up to about 10 percentage points.


Similar to the observation (by simonw) that they respond reasonably to "TDD: Red => Green"

I've used that ever since. Works most of the time, but other stuff is often failing and I've learned to become distrustful of an agent very quickly. One mistake where I point it out and the agent corrects itself is fine if it keeps working well after. A second mistake when it's trying to fix the first one or an inability to understand or a claim that it fixed it but it didn't is instant termination (after dumping context for the next agent).


When I see people talking about Claude Code becoming "unusable" for them recently, I believe them, but I don't understand. It's a deeply flawed and buggy piece of software but it's very effective. One of the strangest things about AI to me is that everyone seems to have a radically different experience.


> everyone seems to have a radically different experience

What people have is radically different expectations.

I noticed engineers will review Claude's output and go "holy crap that's junior-level code". Coders will just commit because looking at the code is a waste of time. Move fast, break things, disrupt, drown yourself into tech debt: the investors won't care anyways.

And no, telling the agent to "be less shit" doesn't work. I have to painstakingly point every single shit architectural decision so Claude can even see and fix it. "Git gud" didn't work for people and doesn't work for LLMs.

It's not that the code isn't DRY, it's just DRY at the wrong points of abstraction, which is even worse than not being DRY. I manage to find better patterns in each and every single task I tell Claude or Copilot to autonomously work on, dropping tons of code in the process (DRY or not). You can't prompt Claude out of making these wrong decisions (at best from very basic mistakes) since they are too granular to even extract a rule.

This is what separates a senior from a junior.

If you think Claude writes good code either you're very lucky, I'm very bad at prompting, or your standards are too low.

Don't get me wrong. I love Claude Code, but it's just a tool in my belt, not an autonomous engineer. Seeing all these "Claude wrote 97% of my code" makes me shudder at the amount of crap I will have to maintain 5 years down the line.


You have to tell it both what and how. That way it's decidedly less shit. Still needs tons of passes just keeping things somewhat coherent, but it mostly works.


>One of the strangest things about AI to me is that everyone seems to have a radically different experience.

I've thought about this and I think the reason is as follows: we hold code written by ourselves to a much higher standard than code written by somebody else. If you think of AI code as your own code, then it probably won't seem very acceptable because it lacks the beauty (partly subjective as all beauty tends to be) that we put into our own code. If you think of it as a coworker's code, then it's usually alright i.e. you wouldn't be wildly impressed with that coworker but it would also not be bad enough to raise a stink.

It follows from this that it also depends on how you regard the codebase that you're working on. Do you think of it as a personal masterpiece or is it some mishmash camel by committee as the codebases at work tend to be?


I use it through the desktop app, which has a lot of features I appreciate. Today it was implementing a feature. It came across a semi-related bug that wasn’t a stopper but should really be fixed before go live. Instead of tackling it itself or mentioning it at the final summary (where it becomes easy to miss), it triggered a modal inside the Claude app with a description of the issue and two choices: fix in another session or fix in current session. Really good way to preserve context integrity and save tokens!


How to you get CC to connect to your dev container? I have the CC app but it’s kinda useless as I’m not have it barebacking my system, so I’m left with the cli and vs code extension.


I just run CC in a VM. It gets full control over the VM. The VM doesn't have access to my internal networks. I share the code repos it works on over virtiofs so it has access to the repos but doesn't have access to my github keys for pushing and pulling.

This means it can do anything in the VM, install dependencies, etc... So far, it managed to bork the VM once (unbootable), I could have spent a bit of time figuring out what happened but I had a script to rebuild the VM so didn't bother. To be entirely fair to claude, the VM runs arch linux which is definitely easier to break than other distros.


You have to try Codex. My friend's been trying to convert me for months and he was right all along: with Codex you don't GSD or whatever prompting metaframework. You rarely (I actually haven't need to do this at all) need to ask it to retry because its implementation is bugged: it literally just works first try.

Maybe that's because the harness maybe (not sure; haven't looked at their source code) has it baked in? Doesn't matter; the point is that it works.

Now, the one thing I heavily dislike is the UI it generates...it doesn't seem to realize that matching UI patterns with the existing codebase is quite important.


> One of the strangest things about AI to me is that everyone seems to have a radically different experience.

Because it is that uneven. Some problems it nails at first go or with very little cosmetic changes.

In others it decides on solution, hallucinates parts that do not exist like adding API calls or config options that do not exists and gets the basics wrong.

Similarly you do something that's somewhat common pattern, it usually nails it. If you do something that subtly differs in certain way from a common pattern, it will just do the common pattern and you get something wrong.


My workflow is to just use LLMs for small context work. Anything that involves multiple files it truly doesn't do better than what I'd expect from a competent dev.

It's bitten me several times at work, and I rather not waste any more of my limited time doing the re-prompt -> modify code manually cycle. I'm capable of doing this myself.

It's great for the simple tasks tho, most feature work are simple tasks IMO. They were only "costly" in the sense that it took a while to previously read the code, find appropriate changes, create tests for appropriate changes, etc. LLMs reduce that cycle of work, but that type of work in general isn't the majority of my time at my job.

I've worked at feature factories before, it's hell. I can't imagine how much more hell it has become since the introduction of these tools.

Feature factories treat devs as literal assembly line machines, output is the only thing that matters not quality. Having it mass induced because of these tools is just so shitty to workers.

I fully expect a backlash in the upcoming years.

---

My only Q to the OP of this thread is what kind of teacher they are, because if you teach people anything about software while admitting that you no longer write code because it's not profitable (big LOL at caring about money over people) is just beyond pathetic.


I think on HN atleast. People enamoured by Claude are the vocal majority.

The view of Claude on HN is extremely positive and nearly every thread will have highly positive comment "that is not an ad".

I think people are seeing others just irked by the constant stream what feels like ads and reading it as Claude being somehow disliked.


Same. It's surprisingly good as a labour saving device. It produces code that I would accept without reservations from a coworker. I still read every line and make tweaks, but they're the same tweaks I would ask for in a code review.

I don't measure my productivity, but I see it in the sort of tasks I tackle after years of waiting. It's especially good at tedious tasks like turning 100 markdown files into 5 json files and updating the code that reads them, for example.


Are you writing code that gets reviewed by other people? Were code reviews hard in the past? Do your coworkers care about "code quality" (I mean this in scare quotes because that means different things to different people).

Are you working more on operational stuff or on "long-running product" stuff?

My personal headcanon: this tooling works well when built on simple patterns, and can handle complex work. This tooling has also been not great at coming up with new patterns, and if left unsupervised will totally make up new patterns that are going to go south very quickly. With that lens, I find myself just rewriting what Claude gives me in a good number of cases.

I sometimes race the robot and beat the robot at doing a change. I am "cheating" I guess cuz I know what I want already in many cases and it has to find things first but... I think the futzing fraction[0] is underestimated for some people.

And like in the "perils of laziness lost"[1] essay... I think that sometimes the machine trying too hard just offends my sensibilities. Why are you doing 3 things instead of just doing the one thing!

One might say "but it fixes it after it's corrected"... but I already go through this annoying "no don't do A,B, C just do A, yes just that it's fine" flow when working with coworkers, and it's annoying there too!

"Claude writes thorough tests" is also its own micro-mess here, because while guided test creation works very well for me, giving it any leeway in creativity leads to so many "test that foo + bar == bar + foo" tests. Applying skepticism to utility of tests is important, because it's part of the feedback loop. And I'm finding lots of the test to be mainly useful as a way to get all the imports I need in.

If we have all these machines doing this work for us, in theory average code quality should be able to go up. After all we're more capable! I think a lot of people have been using it in a "well most of the time it hits near the average" way, but depending on how you work there you might drag down your average.

[0]: https://blog.glyph.im/2025/08/futzing-fraction.html [1]: https://bcantrill.dtrace.org/2026/04/12/the-peril-of-lazines...


You hinted at an aspect I probably haven't considered enough: The code I'm working on already has many well-established, clean patterns and nearly all of Claude's work builds on those patterns. I would probably have a very different experience otherwise.


I legit think this is the biggest danger with velocity-focused usage of these tools. Good patterns are easy to use and (importantly!) work! So the 32nd usage of a good pattern will likely be smooth.

The first (and maybe even second) usage of a gnarly, badly thought out pattern might work fine. But you're only a couple steps away from if statement soup. And in the world where your agent's life is built around "getting the tests to pass", you can quickly find it doing _very_ gnarly things to "fix" issues.


I’ve seen ai coding agents spin out and create 1_000 line changesets that I have to stop before they are 10_000. And then I look at the problem and change one line instead.


This is it right here. Claude loves to follow existing patterns, good or bad. Once you have a solid foundation, it really starts to shine.

I think you're likely in the silent majority. LLMs do some stupid things, but when they work it's amazing and it far outweighs the negatives IMHO, and they're getting better by leaps and bounds.

I respect some of the complaints against them (plagiarism, censorship, gatekeeping, truth/bias, data center arms race, crawler behavior, etc.), but I think LLMs are a leap forward for mankind (hopefully). A Young Lady's Illustrated Primer for everyone. An entirely new computing interface.


We noticed this and spent a week or two going through and cleaning up tests, UI components, comments, and file layout to be a lot more consistent throughout the codebase. Codebase was not all AI written code - just many humans being messy and inconsistent over time as they onboard/offboard from the project.

Much like giving a codebase to a newbie developer, whatever patterns exist will proliferate and the lack of good patterns means that patterns will just be made up in an ad-hoc and messy way.


You haven't answered the question though. Are your code peer reviewed? Are they part of client-facing product? No offense, I like what you are doing, but I wouldn't risk delegation this much workload in my day job, even though there is a big push towards AI.


> My personal headcanon: this tooling works well when built on simple patterns, and can handle complex work. This tooling has also been not great at coming up with new patterns, and if left unsupervised will totally make up new patterns that are going to go south very quickly. With that lens, I find myself just rewriting what Claude gives me in a good number of cases.

I've been doing a greenfield project with Claude recently. The initial prototype worked but was very ugly (repeated duplicate boilerplate code, a few methods doing the same exact thing, poor isolation between classes)... I was very much tempted to rewrite it on my own. This time, I decided to try and get it to refactor so get the target architecture and fix those code quality issues, it's possible but it's very much like pulling teeths... I use plan mode, we have multiple round of reviews on a plan (that started based on me explaining what I expect), then it implements 95% of it but doesn't realize that some parts of it were not implemented... It reminds me of my experience mentoring a junior employee except that claude code is both more eager (jumping into implementation before understanding the problem), much faster at doing things and dumber.

That said, I've seen codebases created by humans that were as bad or worse than what claude produced when doing prototype.


I feel the same way. Doesn't make sense economically or even in good faith for me to use company paid time writing code for line of business apps at anymore and I'm 28 years into this kind of work.


I used Claude to help me with a function once and it added a memory leak, it wouldn’t have been noticeable to most people but I saw. I still write my own code and find LLMs frustrating because they almost get it right and it’s just more efficient for me to just write the code correctly instead of having an LLM write something that’s almost correct and me fixing it after the fact.

I can’t wait for all the future vibe coded projects to be exploited by the black hats waiting in the shadows for things to reach a critical state. I don’t believe in anthropic because they love to lie.


> I intend for this post to be a question: what am I doing that makes Claude profoundly effective?

I'm fascinated by this question.

I think the first two sections of this article point towards an answer: https://aphyr.com/posts/412-the-future-of-everything-is-lies...

I've personally had radically different experiences working on different projects, different features within the same project, etc.


How much does it cost though?

This is the problem.

I think there is a huge gap between people on salaries getting effectively more responsibility by being given spend that they otherwise would not have had and people hustling on projects on their own.

Yes it is 100% what I use but I am never happy with usage. It burns up by sub fast and there is little feelings of control. Experiments like using lower tier models are hard to understand in reality. Graphify might work or it might not. I have no idea.


I think a lot of use have implemented our own ad hoc self-improvement checks into our agentic workflows. My observations are the same as yours.


I am genuinely interested to know some details:

1. Is a product/software you develop novel? As in does it do something useful and unique? Or it's a product that already exists in many varietes and yours is just "one of ..."?

2. What if one day, LLMs will get regulated/become terrible/raise prices above your budget. Do you have plans for that?


1. Fairly - I definitely don't see any training material about the stuff I do on the internet:D it's really far from your avg front-end app. And of course you can't let any of those make decisions automatically. Remember the IBM quote, "a computer can not be held accountable therefore a computer must not make any management decisions"... Even on completely greenfield and groundbreaking projects there's lots of throwaway code, scaffolding and so on. You contribute the value-add, you use the flanker to speed up the boring and grey parts.

2. Regulation? I'm sceptical that the cat can be put back into the bag. It's already out there. More realistic problem is the business model part - openweight/local provides a counterpoint to that.


I'm in a similar situation

1. Even really novel projects have large chunks of glue code and boring infrastructure that the novel bits depend on. claude means I spend 10% of my time on the borng stuff and 90% of time on stuff I previously onky had 10% of my day to work on. In my experience the software picked up our idioms fast and for context, we have a skill file explaining code standards.

2. codex and gemini are comparable when paired with a good harness (pi.dev). if things ever get really bad, I'll drop 8k on a dedicated agent coding server and run it locally. I tried it recently with my current system and it was sub par but I was running a drasticly simpler model.


To people stating these high commit numbers: What is your average changeset size? I have found that having agent do large changes (few hundred lines or more) results in a lot of friction for me and it feels like at some point I leave a happy path where instead of moving quickly I get dragged down.


The article has a benchmark and Opus has best score in two categories and the second-best in another (there are only three categories). Opus is probably the best choice when it comes to producing readable code right now. GPT (for example) lags way behind.


Anecdotally it’s the exact opposite for me: gpt 5.4 is leagues ahead of opus for the kind of backend work I do. Opus keeps making stupid mistakes while overengineering the irrelevant parts. However when I have to work on the backoffice ui, I still pick opus.


The silent majority of GenAI praise reaches the top of the thread again.

Edit: The lurkers and the commenters must be a pretty different set of people I suppose.


Is your claude.md, skills or other settings that you have honed public?


Sorry, no, and they're highly project specific anyway. I just started with the "/init" skill a few weeks ago and gradually improved it from there.


Which subscription tier are you using?


I'm on the $200/month max plan.


Makes sense, maybe it is worth it...


Wait till you try codex so you don’t have to keep saying ‘don’t be lazy’


I wonder what the PGP signing concept does to thwart people who want to profit and don't care about the public good. It seems like anyone who attends a signing party can sell their key to the highest bidder, leading to bots and spammers all over again.


In the flat trust model we currently use most places, it's on each person to block each spammer, bot, etc. The cost of creating a new bot account is low so it's cheap to make them come back.

On a web of trust, if you have a negative interaction with a bot, you revoke trust in one of the humans in the chain of trust that caused you to come in contact with that bot. You've now effectively blocked all bots they've ever made or ever will make... At least until they recycle their identity and come to another key signing party.

Once you have the web in place though, a series of "this key belongs to a human" attestations, then you can layer metadata on top of it like "this human is a skilled biologist" or "this human is a security expert". So if you use those attestations to determine what content your exposed to then a malicious human doesn't merely need to show up at a key signing party to bootstrap a new identity, they also have to rebuild their reputation to a point where you or somebody you trust becomes interested in their content again.

Nothing can be done to prevent bad people from burning their identities for profit, but we can collectively make it not economical to do so by practicing some trust hygiene.

Key signing establishes a graph upon which more effective trust management becomes possible. It on its own is likely insufficient.


You can never prevent things like this, but you can make it expensive enough to effectively solve the problem for almost all use cases.


Also add a PR reviewer bot. Give it authority to reject the PR, but no authority to merge it. Let the AIs fight until the implementation AI and the reviewer AI come to an agreement. Also limit the number of rounds they're permitted to engage in, to avoid wasting resources. I haven't done this myself, but my naive brain thinks it's probably a good idea.


> I haven't done this myself, but my naive brain thinks it's probably a good idea.

Many a disaster started this way


Yep, we're on the same wavelength.


I believe the GP post is saying that if we react to the new AI-enabled environment by arbitrarily strengthening IP controls for IP owners, the greatest benefactors will almost certainly be lawyer-laden corporations, not communities, artists, or open source projects. That seems like a reasonable argument.

It seems like the answer is to adjust IP owner rights very carefully, if that's possible. It sounds very hard, though.


The article makes the same point; the quote was taken out of context.

The point the author was making was that the intent of GPL is to shift the balance of power from wealthy corporations to the commons, and that the spirit is to make contributing to the commons an activity where you feel safe in knowing that your contributions won't be exploited.

The corporations today have the resources to purchase AI compute to produce AI-laundered work, which wouldn't be possible without the commons the AI it got its training data from, and give nothing back to the commons.

This state of things disincentivizes contributing to the FOSS ecosystem, as your work will be taken advantage of while the commons gets nothing.

Share-alike clause of the GPL was the price that was set for benefitting from the commons.

Using LLMs trained on GPL code to x "reimplement" it creates a legal (but not a moral!) workaround to circumvent GPL and avoid paying the price for participation.

This means that the current iteration of GPL isn't doing its intended job.

GPL had to grow and evolve. The Internet services using GPL code to provide access to software without, technically, distributing it was a similar legal (but not moral) workaround which was addressed with an update in GPL.

The author argues that we have reached another such point. They don't argue what exactly needs to be updated, or how.

They bring up a suggestion to make copyrightable the input to the LLM which is sufficient to create a piece of software, because in the current legal landscape, creating the prompt is deemed equivalent to creating the output.

You can't have your cake and eat it too.

A vibe-coded API implementation created by an LLM trained on open source, GPL licensed code can only be considered one of two things:

— Derivative work, and therefore, subject to the requirement to be shared under the GPL license (something the legal system disagrees with)

— An original work of the person who entered the prompt into the LLM, which is a transformative fair use of the training set (the current position of the legal system).

In the later case, the input to the LLM (which must include a reference to the API) is effectively deemed to be equivalent to the output.

The vibe-coded app, the reasoning goes, isn't a photocopy of the training data, but a rendition of the prompt (even though the transformativeness came entirely from the machine and not the "author").

Personally, I don't see a difference between making a photocopy by scanning and printing, and by "reimplementing" API by vibe coding. A photocopy looks different under a microscope too, and is clearly distinguishable from the original. It can be made better by turning the contrast up, and by shuffling the colors around. It can be printed on glossy paper.

But the courts see it differently.

Consequently, the legal system currently decided that writing the prompt is where all the originality and creative value is.

Consequently, de facto, the API is the only part of an open source program that has can be protected by copyright.

The author argues that perhaps it should be — to start a conversation.

As for who the benefactors are from a change like that — that, too, is not clear-cut.

The entities that benefit the most from LLM use are the corporations which can afford the compute.

It isn't that cheap.

What has changed since the first days of GPL is precisely this: the cost of implementing an API has gone down asymmetrically.

The importance of having an open-source compiler was that it put corporations and contributors the commons on equal footing when it came to implementation.

It would take an engineer the same amount of time to implement an API whether they do it for their employer or themselves. And whether they write a piece of code for work or for an open-source project, the expenses are the same.

Without an open compiler, that's not possible. The engineer having access to the compiler at work would have an infinite advantage over an engineer who doesn't have it at home.

The LLM-driven AI today takes the same spot. It's become the tool that software engineers can and do use to produce work.

And the LLMs are neither open nor cheap. Both creating them as well as using them at scale is a privilege that only wealthy corporations can afford.

So we're back to the days before the GNU C compiler toolchain was written: the tools aren't free, and the corporations have effectively unlimited access to them compared to enthusiasts.

Consequently, locking down the implementation of public APIs will asymmetrically hurt the corporations more than it does the commons.

This asymmetry is at the core of GPL: being forced to share something for free doesn't at all hurt the developer who's doing it willingly in the first place.

Finally, looking back at the old days ignores the reality. Back in the day, the proprietary software established the APIs, and the commons grew by reimplementing them to produce viable substitutes.

The commons did not even have its own APIs worth talking about in the early 1990s. But the commons grew way, way past that point since then.

And the value of the open source software is currently not in the fact that you can hot-swap UNIX components with open source equivalents, but in the entire interoperable ecosystem existing.

The APIs of open source programs are where the design of this enormous ecosystem is encoded.

We can talk about possible negative outcomes from pricing it.

Meanwhile, the already happening outcome is that a large corporation like Microsoft can throw a billion dollars of compute on "creating" MSLinux and refabricating the entire FOSS ecosystem under a proprietary license, enacting the Embrace, Extend, Extinguish strategy they never quite abandoned.

It simply didn't make sense for a large corporation to do that earlier, because it's very hard to compete with free labor of open source contributors on cost. It would not be a justifiable expenditure.

What GPL had accomplished in the past was ensuring that Embracing the commons led to Extending it without Extinguishing, by a Midas touch clause. Once you embrace open source, you are it.

The author of the article asks us to think about how GPL needs to be modified so that today, embracing and extending open-source solutions wouldn't lead to commons being extinguished.

Which is exactly what happened in the case of the formerly-GPL library in question.


A lot of code is "useless" only in the sense that no one wants to buy it and it will never find its way into an end user product. On the other hand, that same code might have enormous value for education, research, planning, exploration, simulation, testing, and so on. Being able to generate reams of "useless" code is a highly desirable future.


Obviously "useful" doesn't just involve making money. Code that will be used for education and all of these things is clearly not useless.

But let's be honest to ourselves, the sort of useless code the GP meant will never ever be used for any of that. The code will never leave their personal storage. In that sense it's about as valuable for the society at large as the combined exabytes of GenAI smut that people have been filling their drives with by running their 4090s 24/7.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: