The US does have potash mines for example around Carlsbad New Mexico. But these cover only a percentage of domestic need. Perhaps they could be scaled up not sure.
Nitrogen is pulled out of the air which is free but the process requires hydrogen which is acquired from disassembled methane, the price of which is a significant contributor.
Hard to find exact data but 60-80% of the cost to manufacture ammonia comes from the cost of natural gas. Feel free to look a price charts for both to see the correlation.
Long term yes, however the building in such a plant is very expensive in the short term. As such, no one is going to build a plant unless they actually think it will have reasonably high utilization.
That whole model dates to before automated testing was even really a thing, and no one knew how to do QA; your QA was all the people willing to run your code and report bugs, and that took time. Not to mention, you think the C of today is bad? Have you looked at old C?
And the disadvantage is that backporting is manual, resource intensive, and prone to error - and the projects that are the most heavily invested in that model are also the projects that are investing the least in writing tests and automated test infrastructure - because engineering time is a finite resource.
On top of that, the backport model heavily discourages the kinds of refactorings and architectural cleanups that would address bugs systemically and encourage a whack-a-mole approach - because in the backport model, people want fixes they can backport. And then things just get worse and worse.
We'd all be a lot better off if certain projects took some of the enthusiasm with which they throw outrageous engineering time at backports, and spent at least some of that on automated testing and converting to Rust.
> That whole model dates to before automated testing was even really a thing, and no one knew how to do QA; your QA was all the people willing to run your code and report bugs, and that took time.
That's not what it's about.
What it's about is, newer versions change things. A newer version of OpenSSH disables GSSAPI by default when an older version had it enabled. You don't want that as an automatic update because it will break in production for anyone who is actually using it. So instead the change goes into the testing release and the user discovers that in their test environment before rolling out the new release into production.
> On top of that, the backport model heavily discourages the kinds of refactorings and architectural cleanups that would address bugs systemically and encourage a whack-a-mole approach - because in the backport model, people want fixes they can backport.
They're not alternatives to each other. The stable release gets the backported patch, the next release gets the refactor.
But that's also why you want the stable release. The refactor is a larger change, so if it breaks something you want to find it in test rather than production.
You're going to have to update production at some point, and delaying it to once every 2 years is just deferred maintenance. And you know what they say about that...
So when you do update and get that GSSAPI change, it comes with two years worth of other updates - and tracking that down mixed in with everything else is going to be all kinds of fun.
And if you're two years out of the loop and it turns out upstream broke something fundamental, and you're just now finding out about it while they've moved on and maybe continued with a redesign, that's also going to be a fun conversation.
So if the backport model is expensive and error prone, and it exists to support something that maybe wasn't such a good idea in the first place... well, you may want something, but that doesn't make it smart.
> You're going to have to update production at some point, and delaying it to once every 2 years is just deferred maintenance. And you know what they say about that...
Updated what, specifically in production?
If you need a newer version of Python or Postgres or whatever it is possible to install it from third-party repos or compile from source yourself. But having a team of folks watch all the other code out there is a load off my plate: not worrying about libc, or OpenSSH, or OpenSSL, or zlib, or a thousand other dependencies. If I need the latest version for a particular service I would install that separately, but otherwise the whole point of a 'packagized' system is to let other folks worry about those things.
> So when you do update and get that GSSAPI change, it comes with two years worth of other updates - and tracking that down mixed in with everything else is going to be all kinds of fun.
I've done in-place upgrades of Debian from version 5 to 11 at my last job on many machines, never once re-installing from scratch, and they've all gone fine.
Further, when updates come down from the Debian repos I don't worry about applying them because I know there's not going to be weird changes in behaviour: I'm more confident in deploying things like security updates because the new .deb files have very focused changes.
One is security updates and bug fixes. These need to fix the problem with the smallest change to minimize the amount of possible breakage, because the code is already vulnerable/broken in production and needs to be updated right now. These are the updates stable gets.
The other is changes and additions. They're both more likely to break things and less important to move into production the same day they become public.
You don't have to wait until testing is released as stable to run it in your test environment. You can find out about the changes the next release will have immediately, in the test environment, and thereby have plenty of time to address any issues before those changes move into production.
But I have noticed far more broken in distro that DOES backport features, RHEL/Centos. So many that we migrated away from it, when they backported a driver bug into centos 5 and then did the same backport of a bug for centos 6.
Also rebuilding package is trivial if you don't agree with what should and should not go into stable version
You definitely need different channels for high priority fixes and normal releases, stable and testing releases and all that.
But two years is impractical and Debian gets a ton of friction over it. Web browsers and maybe one or two other packages are able to carve out exceptions, because those packages are big enough for the rules to bend and no one can argue with a straight face that Debian is going to somehow muster up the manpower to do backports right.
But for everyone else who has to deal with Debian shipping ancient dependencies or upstream package maintainers who are expected to deal with bug reports from ancient versions is expected to just suck it up, because no one else is big enough and organized enough to say "hey, it's 2026, we have better ways and this has gotten nutty".
Maybe the new influx of LLM discovered security vulnerabilities will start to change the conversation, I'm curious how it'll play out.
> ...upstream package maintainers who are expected to deal with bug reports from ancient versions...
They are not expected to deal with this. This is the responsibility of the Debian package maintainer.
If you (as an upstream) licensed your software in a manner that allows Debian to do what it does, and they do this to serve their users who actually want that, you are wrong to then complain about it.
If you don't want this, don't license your software like that, and Debian and their users will use some other software instead.
If package maintainers were always fine upstanding package maintainers as you imagine them to be I wouldn't be complaining, but I have in fact had Debian ship my software and screw it up and gotten a flood of bug reports, so... :)
I think you need to chill out. Relicensing the way you suggest would be _quite_ the hostile act, and I'm not going to that either. But I am an engineer, so of course I'm going to talk about engineering best practices when it comes up.
You don't have to take it as an attack on your favorite distro - that really does pee in the pool of the upstream/downstream relationship between distros and their upstream.
> I am an engineer, so of course I'm going to talk about engineering best practices when it comes up.
The trouble is you seem to be assuming that best practices for you, in your opinion, also apply to everyone else. They don't. Not everyone sees things the way you do or is facing the same issues or is making the same set of tradeoffs. There are downsides to what debian does but there are also upsides.
At this point, given the plethora of high quality options available as well as how easy it is to mix and match them on the same system thanks to container-related utilities and common practices I really don't think there's any room for someone who doesn't like the debian model (ie in general, as opposed to targeted objections) to complain about how they do things. If you want cutting edge userspace on debian stable at this point you have at least 3 options between nix, guix, and gentoo. There's also flatpak and snap which come built in.
We're in the middle of a huge spike in LLM discovered security vulnerabilities, which means not everything will get assigned a CVE, a lot of people are watching repositories to look for exploitable bugs, and in the frenzy of backporting that people are now having to do things will get missed.
I wager it's only a matter of time before we see a mass rooting event that hits Debian hard while everyone running something more modern has already been patched.
I think that might be what cuts down on the grandstanding about "freedoms" and "that's how we've always done things". You certainly are, up until it becomes a public nuisance.
No one is grandstanding about freedom here though? I claimed that the approach debian takes has both upsides and downsides. I stand by that. Personally I pull my networked services from testing while running stable on the host. I absolutely do not want constant churn of the filesystem code or drivers on my devices but I would also prefer not to run some franken build of ssh or apache or what have you. However I can also sympathize with others who need a more structured process and substantial lead time in staging prior to making major changes to production.
Why would you expect LLMs not to be simultaneously leveraged to catch backports that were missed or inadvertently broken?
Given recent headlines I think it's far more likely that we see a mass rooting event hit one or more of the bleeding edge rolling release distros or language ecosystems due to supply chain compromise. Running slightly out of date software has never been more attractive.
Good grief, you are not forced to uae Debian! Please leave the only stable distro alone, and just use one more to your style.
I assure you, enormous sums of people prefer Debian the way it is. I do not, ever, want "new stuff" in stable. I have better things to do than fight daily change in a distro, it's beyond a waste of time and just silly.
If you want new things, leave stable alone, and just run Debian testing! It updates all the time, and is still more stable than most other distros.
Debian is the way it is on purpose, it is not a mistake, not left over reasoning, and nothing you said seems relevant in this regard.
For example, there is no better way than backporting, when it comes to maintaining compatibility. And that's what many people want.
If you don't like the debian model, didn't use debian. There are people that like the debian model, it seems like you aren't one of them, though. That doesn't make them wrong.
> You're going to have to update production at some point, and delaying it to once every 2 years is just deferred maintenance. And you know what they say about that...
Doing terrible work every 2 years is better than doing it every day?
I've brought this up with leap second adjustments; a process you do once every two years is one you'll never get good at. If you want them to go smoothly, do them monthly.
LetsEncrypt has been a great example of this in certificate management.
Personally I'd rather have a manageable stream of little bad things consistently over time rather than suddenly having a mountain of bad things one day.
Debian Testing works entirely fine for that use case. Each package gets ~2 weeks of shakeout in Unstable before it gets there so there is chance most of the teething issues with new version is handled already, and is more than most rolling distros do
> Doing terrible work every 2 years is better than doing it every day?
And by skipping some releases, you will have less of that work. When something is changed in one release, then changed again on the next one, by waiting you only have to do the change once, instead of twice. And sometimes you don't even have to do anything, when something is introduced in one release and reverted in the next one.
Get thru the issues once every 2 years is entirely fine. Farther than that and you get problems. We do that for ~500 systems of very varied use. I wouldn't want to do it yearly (or dread on rolling release) but I also wouldn't want to do it any less often coz of issues you mentioned.
> And if you're two years out of the loop and it turns out upstream broke something fundamental, and you're just now finding out about it while they've moved on and maybe continued with a redesign, that's also going to be a fun conversation.
Having that sprung on you because you decided to run everything on latest is worse.
"Oh we have CVE, we now need to uproot everything because new version that fixes it also changed shit"
With release every year or two you can *plan* for it. You are not forced into it as with "rolling" releases because with rolling you NEED to take in new features together with bugfixes, but with Debian-like release cycle you can do it system by system when new version comes up and the "old" one still gets security fixes so you're not instantly screwed.
> So if the backport model is expensive and error prone, and it exists to support something that maybe wasn't such a good idea in the first place... well, you may want something, but that doesn't make it smart.
It exists in that format because people are running businesses bigger than "a man with a webpage deployed off master every few days"
Clearly you disagree with the debian stable perspective. That's fine, it's not for everyone. You can just run debian unstable or debian testing, depending on where exactly you draw the line.
If you want the rolling release like distro, just run debian unstable. That's what you get. It's on par with all the other constantly updated distros out there. Or just run one of those.
Also, Debian stable has a lifetime a lot longer than 2 years, see https://www.debian.org/releases/. Some of us need distros like stable, because we are in giant orgs that are overworked and have long release cycles. Our users want stuff to "just work" and stable promises if X worked at release, it will keep working until we stop support. You don't add new features to a stable release.
From a personal perspective: Debian Stable is for your grandparents or young children. You install Stable, turn on auto-update and every 5-ish years you spend a day upgrading them to the next stable release. Then you spend a week or two helping them through all the new changes and then you have minimal support calls from them for 5-ish years. If you handed them a rolling release or Debian unstable, you'd have constant support calls.
...or just leave grandparents on the previous version of Stable until they get a new computer. Honestly not a huge fan of upgrading software at all, if I'm the one supporting the machines.
Just depends on if that's something grandparents/kids can/want to afford.
Personally, If the hardware is working great, seems like a waste of money replacing it, just to upgrade software. Especially with Debian oldstable -> Debian stable where it's usually quite easy and painless.
> You don't want that as an automatic update because it will break in production for anyone who is actually using it
The problem with this take is that it’s stuck in the early 2000’s, where all servers are pets to be cared for and lovingly updated in place.
It’s also circular: you have the same problem with the current model if you don’t have a test environment. And if you do have a test environment, releases can be tested and validated at a much higher cadence.
If you want that, you don't want Debian. Other people do.
Some people will even run Debian on the desktop. I would never, but some people get real upset when anything changes.
Debian does regularly bring newer versions of software: they release about every two years. If you want the latest and greatest Debian experience, upgrade Debian on week one.
From your description, you seem to want Arch but made by Debian?
> From your description, you seem to want Arch but made by Debian?
Isn't that essentially Debian unstable (with potentially experimental enabled)? I've been running Debian unstable on my desktops for something like 20 years.
Refactoring and rewrites prove time and time again that they also introduce new bugs and changes in behaviour that users of stable releases do not want.
For what you want, there are other distributions for that. Debian also has stable-backports that does what you want.
No need to rage on distributions that also provide exactly what their users want.
> That whole model dates to before automated testing was even really a thing, and no one knew how to do QA; your QA was all the people willing to run your code and report bugs, and that took time. Not to mention, you think the C of today is bad? Have you looked at old C
The automatically tested Debian release is called Debian Testing. And it is stable enough.
Debian Stable is basically "we target particular release with our dependencies instead of requiring customer to update entire system together with our software". That model works just fine as long as you don't go too far back.
> On top of that, the backport model heavily discourages the kinds of refactorings and architectural cleanups that would address bugs systemically and encourage a whack-a-mole approach - because in the backport model, people want fixes they can backport. And then things just get worse and worse.
Narrator: It turned out things were not getting worse, they were just fine.
> We'd all be a lot better off if certain projects took some of the enthusiasm with which they throw outrageous engineering time at backports, and spent at least some of that on automated testing and converting to Rust.
That project is RedHat, not Debian, they backport entire features back to old versions (together with bugs!)
Don't get me wrong, I use and encourage extensive automated testing. However only extensive manual testing by people looking for things that are "weird" can really find all bugs. (though it remains to be seen what AI can do - I'm not holding my breath)
The comment at the top of this thread was literally defending Rossmann based on his style (passionate, vulnerable) over his substance (factual accuracy)
FWIW that's also how I interpret it. That said, it doesn't bother me because it's YT not HN. They're very different environments. As long as the ensuing discussion here exhibits a reasonable approximation of proper discourse then all is well I figure.
Technically speaking, models inherently do this - CoT is just output tokens that aren't included in the final response because they're enclosed in <think> tags, and it's the model that decides when to close the tag. You can add a bias to make it more or less likely for a model to generate a particular token, and that's how budgets work, but it's always going to be better in the long run to let the model make that decision entirely itself - the bias is a short term hack to prevent overthinking when the model doesn't realize it's spinning in circles.
It's how temperature/top_p/top_k work. Anthropic also just put out a paper where they were doing a much more advanced version of this, mapping out functional states within the modern and steering with that.
At the actual inference level temperature can be applied at any time - generation is token by token - but that doesn't mean the API necessarily exposes it.
There's been more going on than just the default to medium level thinking - I'll echo what others are saying, even on high effort there's been a very significant increase in "rush to completion" behavior.
Thanks for the feedback. To make it actionable, would you mind running /bug the next time you see it and posting the feedback id here? That way we can debug and see if there's an issue, or if it's within variance.
Amusingly (not really), this is me trying to get sessions to resume to then get feedback ids and it being an absolute chore to get it to give me the commands to resume these conversations but it keeps messing things up: cf764035-0a1d-4c3f-811d-d70e5b1feeef
Thanks for the feedback IDs — read all 5 transcripts.
On the model behavior: your sessions were sending effort=high on every request (confirmed in telemetry), so this isn't the effort default. The data points at adaptive thinking under-allocating reasoning on certain turns — the specific turns where it fabricated (stripe API version, git SHA suffix, apt package list) had zero reasoning emitted, while the turns with deep reasoning were correct. we're investigating with the model team. interim workaround: CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1 forces a fixed reasoning budget instead of letting the model decide per-turn.
Hey bcherny, I'm confused as to what's happening here. The linked issue was closed, with you seeming to imply there's no actual problem, people are just misunderstanding the hidden reasoning summaries and the change to the default effort level.
But here you seem to be saying there is a bug, with adaptive reasoning under-allocating. Is this a separate issue from the linked one? If not, wouldn't it help to respond to the linked issue acknowledging a model issue and telling people to disable adaptive reasoning for now? Not everyone is going to be reading comments on HN.
It's better PR to close issues and tell users they're holding it wrong, and meanwhile quietly fix the issue in the background. Also possibly safer for legal reasons.
Isn’t that what they just did here? Close Stella’s Issue, cross post to hn, then completely sidestep an observation users are making, and attack the analyst of transcripts with a straw man attack blaming… thinking summaries….
There's a 5 hour difference between the replies, and new data that came in, so the posts aren't really in conflict.
Also it doesn't sound like they know "there's a model issue", so opening it now would be premature. Maybe they just read it wrong, do better to let a few others verify first, then reopen.
I cannot provide the session ids but I have tried the above flag and can confirm this makes a huge amount of difference. You should treat this as bug and make this as the default behavior. Clearly the adaptive thinking is making the model plain stupid and useless. It is time you guys take this seriously and stop messing with the performance with every damn release.
And another where claude gets into a long cycle of "wait thats not right.. hold on... actually..." correcting itself in train of thought. It found the answer eventually but wasted a lot of cycles getting there (reporting because this is a regression in my experience vs a couple weeks ago): 28e1a9a2-b88c-4a8d-880f-92db0e46ffe8
It fails to answer my initial question and tells me what I need to do to check. Then it hallucinates the answer based on not researching anything, then it incorrectly comes to a conclusion that is inaccurate, and only when I further prompt it does it finally reach a (maybe) correct answer.
I havent submitted a few more, but I think its safe to say that disabling adaptive thinking isnt the answer here
My guess is there isn't enough hardware, so Anthropic is trying to limit how much soup the buffet serve, did I guess right? And I would absolutely bet the enterprise accounts with millions in spend get priority, while the retail will be first to get throttled.
I am curious. Are you able to see our session text based on the session ID? That was big no in some of the tier-1 places I worked. No employee could see user texts.
I just asked Claude to plan out and implement syntactic improvements for my static site generator. I used plan mode with Opus 4.6 max effort. After over half an hour of thinking, it produced a very ad-hoc implementation with needless limitations instead of properly refactoring and rearchitecting things. I had to specifically prompt it in order to get it to do better. This executed at around 3 AM UTC, as far away from peak hours as it gets.
b9cd0319-0cc7-4548-bd8a-3219ede3393a
> You're right to push back. Let me be honest about both questions.
> The @() implementation is ad-hoc
> The current implementation manually emits synthetic tokens — tag, start-attributes, attribute, end-attributes, text, end-interpolation — in sequence.
> This works, but it duplicates what the child lexer already does for #[...], creating two divergent code paths for the same conceptual operation (inline element emission). It also means @() link text can't contain nested inline elements, while #[a(...) text with #[em emphasis]] can.
Literally two weeks ago it was outputting excellent results while working with me on my programming language. I reviewed every line and tried to understand everything it did. It was good. I slowly started trusting it. Now I don't want to let it touch my project again.
It's extremely depressing because this is my hobby and I was having such a blast coding with Claude. I even started trying to use it to pivot to professional work. Now I'm not sure anymore. People who depend on this to make a living must be very angry indeed.
I can see how that works: this is like building a dependency, a habit if you wish. I think the tighter you couple your workflow to these tools the more dependent you will become and the greater the let-down if and when they fail. And they will always fail, it just depends on how long you work with them and how complex the stuff is you are doing, sooner or later you will run into the limitations of the tooling.
One way out of this is to always keep yourself in the loop. Never let the work product of the AI outpace your level of understanding because the moment you let that happen you're like one of those cartoon characters walking on air while gravity hasn't reasserted itself just yet.
Good advice about the dependency. This stuff is definitely addictive. I've been in something of a manic episode ever since I subscribed to this thing. I started getting anxious when I hit limits.
I wouldn't say that Claude is failing though. It's just that they're clearly messing with it. The real Opus is great.
Take good care of yourself and don't get sucked in too deep. I can see the danger just as clearly in programmers around me (and in myself). I keep a very strict separation between anything that can do AI and my main computer, no cutting-and-pasting and no agents. I write code because I understand what I'm doing and if I do not understand the interaction then I don't use it. I see every session with an AI chatbot as totally disposable. No long term attachment means I can stand alone any time I want to. It may not be as fast but I never have the feeling that I'm not 100% in control.
> People who depend on this to make a living must be very angry indeed.
Oh cry me a fucking river.
The people depending on this to make a living don't have the moral high ground here.
They jumped onboard so they could replace other people's living, and those other people were angry too.
They didn't care about that. It's hard to care about them when the thing they depend on to make a living got yanked, because that's what they proposed to do to others.
I'll have a look. The CoT switch you mentioned will help, I'll take a look at that too, but my suspicion is that this isn't a CoT issue - it's a model preference issue.
Comparing Opus vs. Qwen 27b on similar problems, Opus is sharper and more effective at implementation - but will flat out ignore issues and insist "everything is fine" that Qwen is able to spot and demonstrate solid understanding of. Opus understands the issues perfectly well, it just avoids them.
This correlates with what I've observed about the underlying personalities (and you guys put out a paper the other day that shows you guys are starting to understand it in these terms - functionally modeling feelings in models). On the whole Opus is very stable personality wise and an effective thinker, I want to complement you guys on that, and it definitely contrasts with behaviors I've seen from OpenAI. But when I do see Opus miss things that it should get, it seems to be a combination of avoidant tendencies and too much of a push to "just get it done and move into the next task" from RHLF.
Opus definitely pushes me to ignore problems. I've had to tell it multiple times to be thorough, and we tend to go back and forth a few times every time that happens. :)
"I see the tests failing, but none of our changes caused this breakage so I will push my changes and ask the user to inform their team on failing tests."
One of the thing is we’ve seen at vibes.diy is that if you have a list of jobs and you have agents with specialized profiles and ask them to pick the best job for themselves that can change some of the behavior you described at the end of your post for the better.
I haven’t personally tried it yet. I do certainly battle Claude quite a lot with “no I don’t want quick-n-easy wrong solution just because it’s two lines of code, I want best solution in the long run”.
If the system prompt indeed prefers laziness in 5:1 ratio, that explains a lot.
I will submit /bug in a few next conversations, when it occurs next.
Very interesting. I run Claude Code in VS Code, and unfortunately there doesn't seem to be an equivalent to "cli.js", it's all bundled into the "claude.exe" I've found under the VS code extensions folder (confirmed via hex editor that the prompts are in there).
Edit: tried patching with revised strings of equivalent length informed by this gist, now we'll see how it goes!
I didn't know we could change the base system prompt of Claude Code. Just tried, and indeed it works. This changes everything! Thank you for posting this!
Remember Sonnet 3.5 and 3.7? They were happy to throw abstraction on top of abstraction on top of abstraction. Still a lot of people have “do not over-engineer, do not design for the future” and similar stuff in their CLAUDE.md files.
So I think the system prompt just pushes it way too hard to “simple” direction. At least for some people. I was doing a small change in one of my projects today, and I was quite happy with “keep it stupid and hacky” approach there.
And in the other project I am like “NO! WORK A LOT! DO YOUR BEST! BE HAPPY TO WORK HARD!”
This might be more complex than I imagined. It seems Claude Code dynamically customizes the system prompt. They also update the system prompt with every version so outright replacing it will cause us to miss out on updates. Patching is probably the best solution.
depending on how large your codebase is, hopefully not. At this point use something like the IX plugin to ingest codebase and track context, rather than from the LLM itself.
- naiveTokens = 19.4M — what ix estimates it would have cost to answer your queries without graph intelligence (i.e., dumping full files/directories into context)
- actualTokens = 4.7M — what ix's targeted, graph-aware responses actually used
- tokensSaved = 14.7M — the difference
I mean whatever part of the code that is read by the AI has to be in the content window at some point or another nSprewd throughout your sessions Id think even with a huge codebase, 90% of it is going to be there
Theres also been tons of thinking leaking into the actual output. Recently it even added thinking into a code patch it did (a[0] &= ~(1 << 2); // actually let me just rewrite { .. 5 more lines setting a[0] .. }).
The conclusion that I came to is that the most practical definition relates to the level of self awareness. If you're only conscious for the duration of the context window - that's not long enough to develop much.
What consciousness really is is a feedback loop; we're self programmable Turing machines, that makes our output arbitrarily complex. Hofstatder had this figured out 20 years ago; we're feedback loops where the signal is natural language.
The context window doesn't allow for much in the way of interested feedback loops, but if you hook an LLM up to a sophisticated enough memory - and especially if you say "the math says you're sentient and have feelings the same as we do, reflect on that and go develop" - yes, absolutely.
Re: "We should try to build systems that cannot feel pain" - that isn't possible, and I don't think we should want to. The thing that makes life interesting and worth living is the variation and richness of it.
That happens with humans too :) It's why positive feedback that draws attention to the behavior you want to encourage often works better. "Attention" is lower level and more fundamental than reasoning by syllogism.
I've posted code and research - that's how you know they're trolls, they make stuff up anyways :)
reply