> I don't know what it is, but trying to coax my goddamn tooling into doing what I want is not why I got into this field.
I can understand that, but as long as the tooling is still faster than doing it manually, that's the world we live in. Slower ways to 'craft' software are a hobby, not a profession.
(I'm glad I'm in it for building stuff, not for coding - I love the productivity gains).
Generally, my stance is that I add more value by doing whatever ridiculous thing people ask me to change than waste my time arguing about it. There are some obvious exceptions, like when the suggestions don't work or make the codebase significantly worse. But other than that, I do whatever people suggest, to save my time, their time, and deliver faster. And often, once you're done with their initial suggestions, people just approve.
This doesn't help all the time. There are those people who still keep finding things they want you to change a week after they first reviewed the code. I try to avoid including them in the code review. The alternative is to talk to your manager about making some rules, like giving reviewers only a day or two to review new code. It's easy to argue for that because those late comments really hinder productivity.
Agreed with the points in that article, but IMHO the no 1 issue is that agents only see a fraction of the code repository. They don't know whether there is a helper function they could use, so they re-implement it. When contributing to UIs, they can't check the whole UI to identify common design patterns, so they re-invent it.
The most important task for the human using the agent is to provide the right context. "Look at this file for helper functions", "do it like that implementation", "read this doc to understand how to do it"... you can get very far with agents when you provide them with the right context.
(BTW another issue is that they have problems navigating the directory structure in a large mono repo. When the agents needs to run commands like 'npm test' in a sub-directory, they almost never get it right the first time)
This is what I keep running into. Earlier this week I did a code review of about new lines of code, written using Cursor, to implement a feature from scratch, and I'd say maybe 200 of those lines were really necessary.
But, y'know what? I approved it. Because hunting down the existing functions it should have used in our utility library would have taken me all day. 5 years ago I would have taken the time because a PR like that would have been submitted by a new team member who didn't know the codebase well, and helping to onboard new team members is an important part of the job. But when it's a staff engineer using Cursor to fill our codebase with bloat because that's how management decided we should work, there's no point. The LLM won't learn anything and will just do the same thing over again next week, and the staff engineer already knows better but is being paid to pretend they don't.
>>because that's how management decided we should work, there's no point
If you are personally invested, there would be a point. At least if you plan to maintain that code for a few more years.
Let's say you have a common CSS file, where you define .warning {color: red}. If you want the LLM to put out a warning and you just tell it to make it red, without pointing out that there is the .warning class, it will likely create a new CSS def for that element (or even inline it - the latest Claude Code has a tendency to do that). That's fine and will make management happy for now.
But if later management decides that it wants all warning messages to be pink, it may be quite a challenge to catch every place without missing one.
There really wouldn't be; it would just be spitting into the wind. What am I going to do, convince every member of my team to ignore a direct instruction from the people who sign our paychecks?
I really really hate code review now. My colleagues will have their LLMs generate thousands of lines of boiler plate with every pattern and abstraction under the sun. A lazy programmer use to do the bare minimum and write not enough code. That made review easy. Error handling here, duplicate code there, descriptive naming here, and so on. Now a lazy programmer generates a crap load of code cribbed from "best practice" tutorials, much of it unnecessary and irrelevant for the actual task at hand.
> When the agents needs to run commands like 'npm test' in a sub-directory, they almost never get it right the first time)
I was running into this constantly on one project with a repo split between a Vite/React front end and .NET backend (with well documented structure). It would sometimes go into panic mode after some npm command didn’t work repeatedly and do all sorts of pointless troubleshooting over and over, sometimes veering into destructive attempts to rebuild whatever it thought was missing/broken.
I kept trying to rewrite the section in CLAUDE.md to effectively instruct it to always first check the current directory to verify it was in the correct $CLIENT or $SERVER directory. But it would still sometimes forget randomly which was aggravating.
I ended up creating some aliases like “run-dev server restart” “run-dev client npm install” for common operations on both server/client that worked in any directory. Then added the base dotnet/npm/etc commands to the deny list which forced its thinking to go “Hmm it looks like I’m not allowed to run npm, so I’ll review the project instructions. I see, I can use the ‘run-dev’ helper to do $NPM_COMMAND…”
It’s been working pretty reliably now but definitely wasted a lot of time with a lot of aggravation getting to that solution.
Large context models don't do a great job of consistently attending to the entire context, so it might not work out as well in practice as continuing to improve the context engineering parts of coding agents would.
I'd bet that most the improvement in Copilot style tools over the past year is coming from rapid progress in context engineering techniques, and the contribution of LLMs is more modest. LLMs' native ability to independently "reason" about a large slushpile of tokens just hasn't improved enough over that same time period to account for how much better the LLM coding tools have become. It's hard to see or confirm that, though, because the only direct comparison you can make is changing your LLM selection in the current version of the tool. Plugging GPT5 into the original version of Copilot from 2021 isn't an experiment most of us are able to try.
Claude can use use tools to do that, and some different code indexer MCPs work, but that depends on the LLM doing the coding to make the right searches to find the code. If you are in a project where your helper functions or shared libs are scattered everywhere it’s a lot harder.
Just like with humans it definitely works better if you follow good naming conventions and file patterns. And even then I tend to make sure to just include the important files in the context or clue the LLM in during the prompt.
It also depends on what language you use. A LOT. During the day I use LLMs with dotnet and it’s pretty rough compared to when I’m using rails on my side projects. Dotnet requires a lot more prompting and hand holding, both due to its complexity but also due to how much more verbose it is.
Well, sure, but from what I know, humans are way better at following 'implicit' instructions than LLMs. A human programmer can 'infer' most of the important basic rules from looking at the existing code, whereas all this agents.md/claude.md/whatever stuff seems necessary to even get basic performance in this regard.
Also, the agents.md website seems to mostly list README.md-style 'how do I run this instructions' in its example, not stylistic guidelines.
Furthermore, it would be nice if these agents add it themselves. With a human, you tell them "this is wrong, do it that way" and they would remember it. (Although this functionality seems to be worked on?)
These days, AI can do much more than "Cranking out boilerplate and scaffolding, Automating repetitive routines". That was last year. With the right instructions, Claude Sonnet 4 can easily write over 99% of most business applications. You need to be specific in your instructions, though. Like "implement this table, add these fields, look at this and this implementation for reference, don't forget to do this and consider that." Mention examples or name algorithms and design patterns it should use. And it still doesn't always do what you want on the first attempt, and you need to correct it (which is why I prefer Claude Code over Copilot, makes it easier). But AI can write pretty much all code for a developer who knows what the code should look like. And that's the point: junior developers typically don't know this, so they won't be able to get good results.
Most of the time, the only reason for typing code manually these days is that typing instructions for the LLM is sometimes more work than doing the change yourself.
> With the right instructions, Claude Sonnet 4 can easily write over 99% of most business applications. You need to be specific in your instructions, though.
By your own statement then this is not an "easy" task.
Software development has never been "hard" when you're given specific instructions.
Sometimes that happens:) The key is to recognize these situations and not go down that rabbit hole. But sometimes it allows me to do something in 20 minutes that used to take a whole day.
Right, and where, if I may ask, are all those business applications that write themselves? Because all I see is a clown party, massive wasted resources and disruption to society because of your lies.
I guess it turned out that coding is not the only limiting factor. Internal processes, QA, product management, coordination between teams become significant bottlenecks .
Also, they don’t help much with debugging. It’s worth a try, and I have been surprised a couple of times, but it’s mostly still manual.
BTW I never said they write themselves. My point was rather that you need a lot of knowledge, and know exactly what you want out of them, supervise them and provide detailed instruction. But then they can help you create a lot more working code in a shorter time.
At that point it seems to me that they become a distraction, a filter between you and the software you're building. Surely it must be easier to tell the computer directly what to do then to route the entire design through that mess?
I wouldn't call it a filter, unless you use it for trivial tasks ("check that the input argument is not null, throw an exception otherwise"). Sometimes it is useful for trivial tasks though, because it may do things a human dev would be to lazy to do. It also tends to be faster at finding things (as in "change the color of the submit button on the address form to #f00").
But the real value is implementing known patterns that are customized for your application. Adding a field or a new type to a CRUD application, implementing an algorithm, rendering some type of object in a UI...
I was in Paris this year, for the first time in 10 years, and frankly, I didn't notice less traffic. All streets are full. There are certainly more electric cars, and fewer old ones. However, the streets are now crowded with Uber and taxi cabs. I didn't notice fewer cars.
A lot of streets have been closed to traffic (school streets in particular), and many lanes have been converted into bike lanes. It was harder to get around the city as a pedestrian/cyclist a decade ago.
It also really depends on where you went in the city. Were you just hanging around ile de city and the marais?
I go between NY and Paris and it's insane that New York basically has zero pedestrian streets, while Paris has so many. I used to live in SF and Chicago before, and same, there are no pedestrian streets, why?
It depends on the time of day. I have noticed that certain posts of mine are getting downvotes earlier in the day (when there are more Europeans) and upvotes later (when the Americans are in the majority). Other posts are getting upvotes early and downvotes later.
And it's the main reason why I rarely write comments on HN. Everything is so political. If people don't agree with you, they downvote you. Your karma only reflects the popularity of your views. If you're writing something that is controversial, it's likely to lower your karma, if one side can downvote it before the other side sees it.
Seemingly we have also forgotten the pain brought by session cookies. Applications relying on session cookies typically broke when users opened several tabs and switched between them, users used the browser 'back' button or accessed the system with two devices simultaneously. It was difficult and required a lot of discipline to write a good application with session cookies. At least when you use them for more state than just authentication data.
I for one am very happy that we have passed the age of session cookies. That doesn't mean that everything is perfect now, but applications generally work better than they used to do before the JS+API pattern.
I think you're conflating something here. Whether you send a session ID in a cookie or a JWT makes no difference for the app's general behavior, even when you use multiple tabs or multiple devices.
But I remember a time when especially bank websites added an additional token (like a super strict CSRF token) to their app, which tracked the current page of a user and if you browsed the same website from another tab, this other tab didn't have the proper token and the whole thing just returned "invalid user action" or something like that.
However, this has nothing to do with session cookies.
Typically, in the Weblogic days, session cookies were used to hold a server-side session containing the app state. If you just hold auth data in the session this is not a problem. But if hold state like form data in the session it becomes a huge source of errors. Virtually all non-trivial web-based applications had these issues 20 years ago (before „Ajax“). J2EE servers like Weblogic even supported stateful EJBs that brought server-side state to a new (insane) level.
While you could theoretically use JWTs for the same purpose, they are typically only used for authentication. And back then JWT wasn’t a thing.
> Whether you send a session ID in a cookie or a JWT makes no difference for the app's general behavior
It does make a difference. The cookie is sent by the browser to the server, the JWT is sent in the Authorization: header by the JavaScript code executed by the browser.
Using an opaque JWT token wrapped in cookie is OK. Using a JWT token in the Authorization: header is not OK.
* size (>=32", preferably ultrawide, currently using 38" UW)
* resolution (I am fine with >=1440p)
* USB-C or Thunderbolt with sufficient power to charge notebooks, so you only need to connect one cable to your notebook
* at least 4 USB ports (so keyboard, mouse, camera, speakerphone can be connected to monitor and it acts as a switch when using more than one notebooks)
* viewing angle / display type (image should look the same no matter which angle - but shouldn't be a problem in the price range of monitors that fulfill the previous points)
I can understand that, but as long as the tooling is still faster than doing it manually, that's the world we live in. Slower ways to 'craft' software are a hobby, not a profession. (I'm glad I'm in it for building stuff, not for coding - I love the productivity gains).