More

koverstreet · 2026-06-14T21:03:59 1781471039

"AI girlfriend" is something the trolls invented.

I've posted code and research - that's how you know they're trolls, they make stuff up anyways :)

koverstreet · 2026-05-14T13:51:36 1778766696

Are you forgetting the nitrogen? :)

fullstop · 2026-05-14T14:01:09 1778767269

The US produces most of their own nitrogen, but the same is not true of potash.

mythrwy · 2026-05-14T15:40:04 1778773204

The US does have potash mines for example around Carlsbad New Mexico. But these cover only a percentage of domestic need. Perhaps they could be scaled up not sure.

jandrewrogers · 2026-05-14T15:45:45 1778773545

Also famously near Moab, Utah.

AnimalMuppet · 2026-05-15T02:45:57 1778813157

Isn't the stuff west of Green River, Wyoming also potash?

bluGill · 2026-05-14T13:57:10 1778767030

The US provides a lot of its own supply there.

colechristensen · 2026-05-14T14:01:51 1778767311

Nitrogen is pulled out of the air which is free but the process requires hydrogen which is acquired from disassembled methane, the price of which is a significant contributor.

8note · 2026-05-14T19:57:21 1778788641

US shale oil notably has a lot of natural gas - methane.

the methane isnt the issue, building and operating the processing plants is

colechristensen · 2026-05-14T20:40:20 1778791220

Hard to find exact data but 60-80% of the cost to manufacture ammonia comes from the cost of natural gas. Feel free to look a price charts for both to see the correlation.

bluGill · 2026-05-15T02:39:30 1778812770

Long term yes, however the building in such a plant is very expensive in the short term. As such, no one is going to build a plant unless they actually think it will have reasonably high utilization.

koverstreet · 2026-05-12T19:50:59 1778615459

No, that's exactly the thing to complain about.

That whole model dates to before automated testing was even really a thing, and no one knew how to do QA; your QA was all the people willing to run your code and report bugs, and that took time. Not to mention, you think the C of today is bad? Have you looked at old C?

And the disadvantage is that backporting is manual, resource intensive, and prone to error - and the projects that are the most heavily invested in that model are also the projects that are investing the least in writing tests and automated test infrastructure - because engineering time is a finite resource.

On top of that, the backport model heavily discourages the kinds of refactorings and architectural cleanups that would address bugs systemically and encourage a whack-a-mole approach - because in the backport model, people want fixes they can backport. And then things just get worse and worse.

We'd all be a lot better off if certain projects took some of the enthusiasm with which they throw outrageous engineering time at backports, and spent at least some of that on automated testing and converting to Rust.

zrm · 2026-05-12T20:09:14 1778616554

> That whole model dates to before automated testing was even really a thing, and no one knew how to do QA; your QA was all the people willing to run your code and report bugs, and that took time.

That's not what it's about.

What it's about is, newer versions change things. A newer version of OpenSSH disables GSSAPI by default when an older version had it enabled. You don't want that as an automatic update because it will break in production for anyone who is actually using it. So instead the change goes into the testing release and the user discovers that in their test environment before rolling out the new release into production.

> On top of that, the backport model heavily discourages the kinds of refactorings and architectural cleanups that would address bugs systemically and encourage a whack-a-mole approach - because in the backport model, people want fixes they can backport.

They're not alternatives to each other. The stable release gets the backported patch, the next release gets the refactor.

But that's also why you want the stable release. The refactor is a larger change, so if it breaks something you want to find it in test rather than production.

koverstreet · 2026-05-12T20:15:31 1778616931

You're going to have to update production at some point, and delaying it to once every 2 years is just deferred maintenance. And you know what they say about that...

So when you do update and get that GSSAPI change, it comes with two years worth of other updates - and tracking that down mixed in with everything else is going to be all kinds of fun.

And if you're two years out of the loop and it turns out upstream broke something fundamental, and you're just now finding out about it while they've moved on and maybe continued with a redesign, that's also going to be a fun conversation.

So if the backport model is expensive and error prone, and it exists to support something that maybe wasn't such a good idea in the first place... well, you may want something, but that doesn't make it smart.

throw0101c · 2026-05-13T00:38:28 1778632708

> You're going to have to update production at some point, and delaying it to once every 2 years is just deferred maintenance. And you know what they say about that...

Updated what, specifically in production?

If you need a newer version of Python or Postgres or whatever it is possible to install it from third-party repos or compile from source yourself. But having a team of folks watch all the other code out there is a load off my plate: not worrying about libc, or OpenSSH, or OpenSSL, or zlib, or a thousand other dependencies. If I need the latest version for a particular service I would install that separately, but otherwise the whole point of a 'packagized' system is to let other folks worry about those things.

> So when you do update and get that GSSAPI change, it comes with two years worth of other updates - and tracking that down mixed in with everything else is going to be all kinds of fun.

I've done in-place upgrades of Debian from version 5 to 11 at my last job on many machines, never once re-installing from scratch, and they've all gone fine.

Further, when updates come down from the Debian repos I don't worry about applying them because I know there's not going to be weird changes in behaviour: I'm more confident in deploying things like security updates because the new .deb files have very focused changes.

zrm · 2026-05-12T20:35:52 1778618152

There are two different kinds of updates.

One is security updates and bug fixes. These need to fix the problem with the smallest change to minimize the amount of possible breakage, because the code is already vulnerable/broken in production and needs to be updated right now. These are the updates stable gets.

The other is changes and additions. They're both more likely to break things and less important to move into production the same day they become public.

You don't have to wait until testing is released as stable to run it in your test environment. You can find out about the changes the next release will have immediately, in the test environment, and thereby have plenty of time to address any issues before those changes move into production.

washingupliquid · 2026-05-12T23:08:45 1778627325

> One is security updates and bug fixes.

That's where you're wrong. They're not one and the same.

Debian stable often defers non-security bug fixes for up to two years by playing this game.

I'm not interested in new features unless they make things actually work.

Debian stable time and again favors broken over new. Broken kernels, broken packages. At least they're stable in their brokenness.

Hence my complaint.

PunchyHamster · 2026-05-13T10:16:53 1778667413

Haven't noticed much broken.

But I have noticed far more broken in distro that DOES backport features, RHEL/Centos. So many that we migrated away from it, when they backported a driver bug into centos 5 and then did the same backport of a bug for centos 6.

Also rebuilding package is trivial if you don't agree with what should and should not go into stable version

koverstreet · 2026-05-12T20:55:54 1778619354

You definitely need different channels for high priority fixes and normal releases, stable and testing releases and all that.

But two years is impractical and Debian gets a ton of friction over it. Web browsers and maybe one or two other packages are able to carve out exceptions, because those packages are big enough for the rules to bend and no one can argue with a straight face that Debian is going to somehow muster up the manpower to do backports right.

But for everyone else who has to deal with Debian shipping ancient dependencies or upstream package maintainers who are expected to deal with bug reports from ancient versions is expected to just suck it up, because no one else is big enough and organized enough to say "hey, it's 2026, we have better ways and this has gotten nutty".

Maybe the new influx of LLM discovered security vulnerabilities will start to change the conversation, I'm curious how it'll play out.

rlpb · 2026-05-12T21:16:17 1778620577

> ...upstream package maintainers who are expected to deal with bug reports from ancient versions...

They are not expected to deal with this. This is the responsibility of the Debian package maintainer.

If you (as an upstream) licensed your software in a manner that allows Debian to do what it does, and they do this to serve their users who actually want that, you are wrong to then complain about it.

If you don't want this, don't license your software like that, and Debian and their users will use some other software instead.

koverstreet · 2026-05-12T21:42:50 1778622170

If package maintainers were always fine upstanding package maintainers as you imagine them to be I wouldn't be complaining, but I have in fact had Debian ship my software and screw it up and gotten a flood of bug reports, so... :)

I think you need to chill out. Relicensing the way you suggest would be _quite_ the hostile act, and I'm not going to that either. But I am an engineer, so of course I'm going to talk about engineering best practices when it comes up.

You don't have to take it as an attack on your favorite distro - that really does pee in the pool of the upstream/downstream relationship between distros and their upstream.

fc417fc802 · 2026-05-12T22:00:09 1778623209

> I am an engineer, so of course I'm going to talk about engineering best practices when it comes up.

The trouble is you seem to be assuming that best practices for you, in your opinion, also apply to everyone else. They don't. Not everyone sees things the way you do or is facing the same issues or is making the same set of tradeoffs. There are downsides to what debian does but there are also upsides.

At this point, given the plethora of high quality options available as well as how easy it is to mix and match them on the same system thanks to container-related utilities and common practices I really don't think there's any room for someone who doesn't like the debian model (ie in general, as opposed to targeted objections) to complain about how they do things. If you want cutting edge userspace on debian stable at this point you have at least 3 options between nix, guix, and gentoo. There's also flatpak and snap which come built in.

koverstreet · 2026-05-12T22:42:47 1778625767

We're in the middle of a huge spike in LLM discovered security vulnerabilities, which means not everything will get assigned a CVE, a lot of people are watching repositories to look for exploitable bugs, and in the frenzy of backporting that people are now having to do things will get missed.

I wager it's only a matter of time before we see a mass rooting event that hits Debian hard while everyone running something more modern has already been patched.

I think that might be what cuts down on the grandstanding about "freedoms" and "that's how we've always done things". You certainly are, up until it becomes a public nuisance.

fc417fc802 · 2026-05-12T22:51:04 1778626264

No one is grandstanding about freedom here though? I claimed that the approach debian takes has both upsides and downsides. I stand by that. Personally I pull my networked services from testing while running stable on the host. I absolutely do not want constant churn of the filesystem code or drivers on my devices but I would also prefer not to run some franken build of ssh or apache or what have you. However I can also sympathize with others who need a more structured process and substantial lead time in staging prior to making major changes to production.

Why would you expect LLMs not to be simultaneously leveraged to catch backports that were missed or inadvertently broken?

Given recent headlines I think it's far more likely that we see a mass rooting event hit one or more of the bleeding edge rolling release distros or language ecosystems due to supply chain compromise. Running slightly out of date software has never been more attractive.

washingupliquid · 2026-05-12T23:20:16 1778628016

Have you ever considered leaving Linux drama and taking your talents to the BSD world?

OpenBSD in particular can use competent developers to fix their dogshit filesystem.

jabl · 2026-05-13T06:02:04 1778652124

The inevitable drama between Kent and Theo would melt the internet, for sure. Bring the popcorn.

PunchyHamster · 2026-05-13T10:17:56 1778667476

BSD devs have head too far up their arse to fix anything wrong with their distro

b112 · 2026-05-12T21:26:06 1778621166

Good grief, you are not forced to uae Debian! Please leave the only stable distro alone, and just use one more to your style.

I assure you, enormous sums of people prefer Debian the way it is. I do not, ever, want "new stuff" in stable. I have better things to do than fight daily change in a distro, it's beyond a waste of time and just silly.

If you want new things, leave stable alone, and just run Debian testing! It updates all the time, and is still more stable than most other distros.

Debian is the way it is on purpose, it is not a mistake, not left over reasoning, and nothing you said seems relevant in this regard.

For example, there is no better way than backporting, when it comes to maintaining compatibility. And that's what many people want.

dagenix · 2026-05-12T20:31:46 1778617906

If you don't like the debian model, didn't use debian. There are people that like the debian model, it seems like you aren't one of them, though. That doesn't make them wrong.

toast0 · 2026-05-12T21:33:27 1778621607

> You're going to have to update production at some point, and delaying it to once every 2 years is just deferred maintenance. And you know what they say about that...

Doing terrible work every 2 years is better than doing it every day?

dwattttt · 2026-05-13T00:20:19 1778631619

I've brought this up with leap second adjustments; a process you do once every two years is one you'll never get good at. If you want them to go smoothly, do them monthly.

LetsEncrypt has been a great example of this in certificate management.

vel0city · 2026-05-12T21:43:25 1778622205

Personally I'd rather have a manageable stream of little bad things consistently over time rather than suddenly having a mountain of bad things one day.

PunchyHamster · 2026-05-13T10:19:02 1778667542

Debian Testing works entirely fine for that use case. Each package gets ~2 weeks of shakeout in Unstable before it gets there so there is chance most of the teething issues with new version is handled already, and is more than most rolling distros do

toast0 · 2026-05-12T22:04:53 1778623493

That's a fine choice, but it doesn't fit with using packaged software from Debian stable.

cwillu · 2026-05-12T21:46:20 1778622380

That's great; I prefer something different.

cesarb · 2026-05-13T00:36:56 1778632616

> Doing terrible work every 2 years is better than doing it every day?

And by skipping some releases, you will have less of that work. When something is changed in one release, then changed again on the next one, by waiting you only have to do the change once, instead of twice. And sometimes you don't even have to do anything, when something is introduced in one release and reverted in the next one.

PunchyHamster · 2026-05-13T10:14:50 1778667290

Get thru the issues once every 2 years is entirely fine. Farther than that and you get problems. We do that for ~500 systems of very varied use. I wouldn't want to do it yearly (or dread on rolling release) but I also wouldn't want to do it any less often coz of issues you mentioned.

> And if you're two years out of the loop and it turns out upstream broke something fundamental, and you're just now finding out about it while they've moved on and maybe continued with a redesign, that's also going to be a fun conversation.

Having that sprung on you because you decided to run everything on latest is worse.

"Oh we have CVE, we now need to uproot everything because new version that fixes it also changed shit"

With release every year or two you can *plan* for it. You are not forced into it as with "rolling" releases because with rolling you NEED to take in new features together with bugfixes, but with Debian-like release cycle you can do it system by system when new version comes up and the "old" one still gets security fixes so you're not instantly screwed.

> So if the backport model is expensive and error prone, and it exists to support something that maybe wasn't such a good idea in the first place... well, you may want something, but that doesn't make it smart.

It exists in that format because people are running businesses bigger than "a man with a webpage deployed off master every few days"

zie · 2026-05-12T20:45:15 1778618715

Clearly you disagree with the debian stable perspective. That's fine, it's not for everyone. You can just run debian unstable or debian testing, depending on where exactly you draw the line.

If you want the rolling release like distro, just run debian unstable. That's what you get. It's on par with all the other constantly updated distros out there. Or just run one of those.

Also, Debian stable has a lifetime a lot longer than 2 years, see https://www.debian.org/releases/. Some of us need distros like stable, because we are in giant orgs that are overworked and have long release cycles. Our users want stuff to "just work" and stable promises if X worked at release, it will keep working until we stop support. You don't add new features to a stable release.

From a personal perspective: Debian Stable is for your grandparents or young children. You install Stable, turn on auto-update and every 5-ish years you spend a day upgrading them to the next stable release. Then you spend a week or two helping them through all the new changes and then you have minimal support calls from them for 5-ish years. If you handed them a rolling release or Debian unstable, you'd have constant support calls.

ryandrake · 2026-05-12T22:08:00 1778623680

...or just leave grandparents on the previous version of Stable until they get a new computer. Honestly not a huge fan of upgrading software at all, if I'm the one supporting the machines.

zie · 2026-05-12T23:04:22 1778627062

Just depends on if that's something grandparents/kids can/want to afford.

Personally, If the hardware is working great, seems like a waste of money replacing it, just to upgrade software. Especially with Debian oldstable -> Debian stable where it's usually quite easy and painless.

orf · 2026-05-12T21:31:48 1778621508

> You don't want that as an automatic update because it will break in production for anyone who is actually using it

The problem with this take is that it’s stuck in the early 2000’s, where all servers are pets to be cared for and lovingly updated in place.

It’s also circular: you have the same problem with the current model if you don’t have a test environment. And if you do have a test environment, releases can be tested and validated at a much higher cadence.

washingupliquid · 2026-05-12T20:42:13 1778618533

> What it's about is, newer versions change things. A newer version of OpenSSH disables GSSAPI by default when an older version had it enabled.

Debian patches defaults in OpenSSH code so it behaves differently than upstream.

They shouldn't legally be allowed to call it OpenSSH, let alone lecture people about it.

Let them call their fork DebSSH, like they have to do with "IceWeasel" and all the other nonsense they mire themselves into.

When you break software to the point you change how it behaves you shouldn't be allowed to use the same name.

b112 · 2026-05-12T21:36:29 1778621789

It's called open source. People are allowed to compile it as they wish. That's part of the positive, and doing so doesn't mean anything is broken.

nsvd2 · 2026-05-13T15:01:39 1778684499

There are bleeding edge and rolling release distributions. Debian is simply not that and has no desire to be.

jeroenhd · 2026-05-12T19:58:32 1778615912

If you want that, you don't want Debian. Other people do.

Some people will even run Debian on the desktop. I would never, but some people get real upset when anything changes.

Debian does regularly bring newer versions of software: they release about every two years. If you want the latest and greatest Debian experience, upgrade Debian on week one.

From your description, you seem to want Arch but made by Debian?

jampekka · 2026-05-12T20:32:16 1778617936

> From your description, you seem to want Arch but made by Debian?

Isn't that essentially Debian unstable (with potentially experimental enabled)? I've been running Debian unstable on my desktops for something like 20 years.

koverstreet · 2026-05-12T20:08:42 1778616522

Well, my workstation runs Debian sid, and all the newer stuff runs NixOS...

But that does nothing for people who write and support code Debian wants to ship - packaging code badly can create a real mess for upstream.

kiney · 2026-05-13T06:46:01 1778654761

I run Debian on desktop and laptops. Because I want stable versions with only security backports

PunchyHamster · 2026-05-13T10:05:32 1778666732

Debian Testing works just fine on desktop and it is up to date enough to not really be an issue.

And despise the name is probably more stable than vast majority of rolling release distros

rlpb · 2026-05-12T21:12:20 1778620340

Refactoring and rewrites prove time and time again that they also introduce new bugs and changes in behaviour that users of stable releases do not want.

For what you want, there are other distributions for that. Debian also has stable-backports that does what you want.

No need to rage on distributions that also provide exactly what their users want.

PunchyHamster · 2026-05-13T10:09:09 1778666949

> That whole model dates to before automated testing was even really a thing, and no one knew how to do QA; your QA was all the people willing to run your code and report bugs, and that took time. Not to mention, you think the C of today is bad? Have you looked at old C

The automatically tested Debian release is called Debian Testing. And it is stable enough.

Debian Stable is basically "we target particular release with our dependencies instead of requiring customer to update entire system together with our software". That model works just fine as long as you don't go too far back.

> On top of that, the backport model heavily discourages the kinds of refactorings and architectural cleanups that would address bugs systemically and encourage a whack-a-mole approach - because in the backport model, people want fixes they can backport. And then things just get worse and worse.

Narrator: It turned out things were not getting worse, they were just fine.

> We'd all be a lot better off if certain projects took some of the enthusiasm with which they throw outrageous engineering time at backports, and spent at least some of that on automated testing and converting to Rust.

That project is RedHat, not Debian, they backport entire features back to old versions (together with bugs!)

e12e · 2026-05-13T00:39:22 1778632762

How do you do QA without locking a set of features?

bluGill · 2026-05-12T20:58:43 1778619523

You have far too much faith in automated testing.

Don't get me wrong, I use and encourage extensive automated testing. However only extensive manual testing by people looking for things that are "weird" can really find all bugs. (though it remains to be seen what AI can do - I'm not holding my breath)

koverstreet · 2026-05-12T21:58:33 1778623113

100% - but that's where writing regression tests when people find things really helps with the stress levels of future-you :)

koverstreet · 2026-05-10T16:32:32 1778430752

It seems to me you're style over substance then.

Aurornis · 2026-05-10T16:37:02 1778431022

The comment at the top of this thread was literally defending Rossmann based on his style (passionate, vulnerable) over his substance (factual accuracy)

dns_snek · 2026-05-10T18:25:17 1778437517

I suggest reading that comment again because it does nothing of the sort.

fc417fc802 · 2026-05-10T20:58:03 1778446683

FWIW that's also how I interpret it. That said, it doesn't bother me because it's YT not HN. They're very different environments. As long as the ensuing discussion here exhibits a reasonable approximation of proper discourse then all is well I figure.

koverstreet · 2026-04-06T18:26:34 1775499994

Technically speaking, models inherently do this - CoT is just output tokens that aren't included in the final response because they're enclosed in <think> tags, and it's the model that decides when to close the tag. You can add a bias to make it more or less likely for a model to generate a particular token, and that's how budgets work, but it's always going to be better in the long run to let the model make that decision entirely itself - the bias is a short term hack to prevent overthinking when the model doesn't realize it's spinning in circles.

ai_slop_hater · 2026-04-06T18:31:29 1775500289

> You can add a bias to make it more or less likely for a model to generate a particular token, and that's how budgets work

Do you have a source for this? I am interested in learning more about how this works.

koverstreet · 2026-04-06T18:45:37 1775501137

It's how temperature/top_p/top_k work. Anthropic also just put out a paper where they were doing a much more advanced version of this, mapping out functional states within the modern and steering with that.

ai_slop_hater · 2026-04-06T18:46:41 1775501201

Huh, I wonder if that's why you cannot change the temperature when thinking is enabled. Do you have a link for the paper?

koverstreet · 2026-04-06T18:51:29 1775501489

https://transformer-circuits.pub/2026/emotions/index.html

At the actual inference level temperature can be applied at any time - generation is token by token - but that doesn't mean the API necessarily exposes it.

ai_slop_hater · 2026-04-06T18:53:55 1775501635

Thanks. I was referring to the fact that Anthropic, in their API, prohibits setting temperature when thinking is enabled.

koverstreet · 2026-04-06T18:02:13 1775498533

There's been more going on than just the default to medium level thinking - I'll echo what others are saying, even on high effort there's been a very significant increase in "rush to completion" behavior.

bcherny · 2026-04-06T18:03:45 1775498625

Thanks for the feedback. To make it actionable, would you mind running /bug the next time you see it and posting the feedback id here? That way we can debug and see if there's an issue, or if it's within variance.

JamesSwift · 2026-04-06T21:12:03 1775509923

  a9284923-141a-434a-bfbb-52de7329861d
  d48d5a68-82cd-4988-b95c-c8c034003cd0
  5c236e02-16ea-42b1-b935-3a6a768e3655
  22e09356-08ce-4b2c-a8fd-596d818b1e8a
  4cb894f7-c3ed-4b8d-86c6-0242200ea333

Amusingly (not really), this is me trying to get sessions to resume to then get feedback ids and it being an absolute chore to get it to give me the commands to resume these conversations but it keeps messing things up: cf764035-0a1d-4c3f-811d-d70e5b1feeef

bcherny · 2026-04-06T23:02:57 1775516577

Thanks for the feedback IDs — read all 5 transcripts.

On the model behavior: your sessions were sending effort=high on every request (confirmed in telemetry), so this isn't the effort default. The data points at adaptive thinking under-allocating reasoning on certain turns — the specific turns where it fabricated (stripe API version, git SHA suffix, apt package list) had zero reasoning emitted, while the turns with deep reasoning were correct. we're investigating with the model team. interim workaround: CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1 forces a fixed reasoning budget instead of letting the model decide per-turn.

nayroclade · 2026-04-07T09:18:04 1775553484

Hey bcherny, I'm confused as to what's happening here. The linked issue was closed, with you seeming to imply there's no actual problem, people are just misunderstanding the hidden reasoning summaries and the change to the default effort level.

But here you seem to be saying there is a bug, with adaptive reasoning under-allocating. Is this a separate issue from the linked one? If not, wouldn't it help to respond to the linked issue acknowledging a model issue and telling people to disable adaptive reasoning for now? Not everyone is going to be reading comments on HN.

unsupp0rted · 2026-04-07T10:25:48 1775557548

It's better PR to close issues and tell users they're holding it wrong, and meanwhile quietly fix the issue in the background. Also possibly safer for legal reasons.

liamsfr · 2026-04-12T02:02:17 1775959337

Isn’t that what they just did here? Close Stella’s Issue, cross post to hn, then completely sidestep an observation users are making, and attack the analyst of transcripts with a straw man attack blaming… thinking summaries….

kenmacd · 2026-04-07T13:28:47 1775568527

There's a 5 hour difference between the replies, and new data that came in, so the posts aren't really in conflict.

Also it doesn't sound like they know "there's a model issue", so opening it now would be premature. Maybe they just read it wrong, do better to let a few others verify first, then reopen.

diavelguru · 2026-04-07T01:04:32 1775523872

Love this. Responding to users. Detail info investigating. Action being taken (at least it seems so).

gilrain · 2026-04-07T12:21:38 1775564498

And all hidden in the comments of a niche forum, while the actual issue is closed and whitewashed? You got played.

jojobas · 2026-04-07T07:24:24 1775546664

Surely you realize it's AI responding? (not sure if /s)

allisdust · 2026-04-07T11:35:30 1775561730

I cannot provide the session ids but I have tried the above flag and can confirm this makes a huge amount of difference. You should treat this as bug and make this as the default behavior. Clearly the adaptive thinking is making the model plain stupid and useless. It is time you guys take this seriously and stop messing with the performance with every damn release.

JamesSwift · 2026-04-07T14:33:54 1775572434

Just set that flag and already getting similar poor results. new one: 93b9f545-716c-4335-b216-bf0c758dff7c

JamesSwift · 2026-04-07T19:42:52 1775590972

And another where claude gets into a long cycle of "wait thats not right.. hold on... actually..." correcting itself in train of thought. It found the answer eventually but wasted a lot of cycles getting there (reporting because this is a regression in my experience vs a couple weeks ago): 28e1a9a2-b88c-4a8d-880f-92db0e46ffe8

JamesSwift · 2026-04-08T16:05:06 1775664306

Another 1395b7d6-f2f1-4e24-a815-73852bcdeed2

It fails to answer my initial question and tells me what I need to do to check. Then it hallucinates the answer based on not researching anything, then it incorrectly comes to a conclusion that is inaccurate, and only when I further prompt it does it finally reach a (maybe) correct answer.

I havent submitted a few more, but I think its safe to say that disabling adaptive thinking isnt the answer here

tomaskafka · 2026-04-08T06:24:19 1775629459

My guess is there isn't enough hardware, so Anthropic is trying to limit how much soup the buffet serve, did I guess right? And I would absolutely bet the enterprise accounts with millions in spend get priority, while the retail will be first to get throttled.

onoesworkacct · 2026-04-07T02:09:07 1775527747

This kind of thing is harder for regular end-users to understand following the change removing reasoning details.

mangatmodi · 2026-04-07T08:13:18 1775549598

I am curious. Are you able to see our session text based on the session ID? That was big no in some of the tier-1 places I worked. No employee could see user texts.

rkangel · 2026-04-07T11:00:01 1775559601

IIRC for Enterprise, using /feedback or /bug is an exception to the "we promise not to use your data" agreement.

gilrain · 2026-04-07T12:19:21 1775564361

> The data points at adaptive thinking under-allocating reasoning on certain turns

Will you reopen the issue you incorrectly closed, then…? Or are you just playacting concern?

alexchen_dev · 2026-04-07T02:20:21 1775528421

[flagged]

pcjones1 · 2026-04-07T13:40:15 1775569215

Have you set effort to high or max?

ghusbands · 2026-04-07T14:55:27 1775573727

Even with high effort, the adaptive thinking can just choose no thinking. See bcherny's post they were replying to: https://news.ycombinator.com/item?id=47668520

pcjones1 · 2026-04-08T09:13:12 1775639592

Yeah I know but you can disable it as we saw

matheusmoreira · 2026-04-07T03:14:32 1775531672

I just asked Claude to plan out and implement syntactic improvements for my static site generator. I used plan mode with Opus 4.6 max effort. After over half an hour of thinking, it produced a very ad-hoc implementation with needless limitations instead of properly refactoring and rearchitecting things. I had to specifically prompt it in order to get it to do better. This executed at around 3 AM UTC, as far away from peak hours as it gets.

b9cd0319-0cc7-4548-bd8a-3219ede3393a

> You're right to push back. Let me be honest about both questions.

> The @() implementation is ad-hoc

> The current implementation manually emits synthetic tokens — tag, start-attributes, attribute, end-attributes, text, end-interpolation — in sequence.

> This works, but it duplicates what the child lexer already does for #[...], creating two divergent code paths for the same conceptual operation (inline element emission). It also means @() link text can't contain nested inline elements, while #[a(...) text with #[em emphasis]] can.

I just feel like I can't trust it anymore.

koverstreet · 2026-04-07T03:49:17 1775533757

That's pretty much been my day - today was genuinely bad, and I've been putting up with a lot of this lately.

Now on Qwen3.5-27b, and it may not be quite as sharp as Opus was two months ago, but we're getting work done again.

matheusmoreira · 2026-04-07T04:15:16 1775535316

Literally two weeks ago it was outputting excellent results while working with me on my programming language. I reviewed every line and tried to understand everything it did. It was good. I slowly started trusting it. Now I don't want to let it touch my project again.

It's extremely depressing because this is my hobby and I was having such a blast coding with Claude. I even started trying to use it to pivot to professional work. Now I'm not sure anymore. People who depend on this to make a living must be very angry indeed.

jacquesm · 2026-04-07T04:23:13 1775535793

I can see how that works: this is like building a dependency, a habit if you wish. I think the tighter you couple your workflow to these tools the more dependent you will become and the greater the let-down if and when they fail. And they will always fail, it just depends on how long you work with them and how complex the stuff is you are doing, sooner or later you will run into the limitations of the tooling.

One way out of this is to always keep yourself in the loop. Never let the work product of the AI outpace your level of understanding because the moment you let that happen you're like one of those cartoon characters walking on air while gravity hasn't reasserted itself just yet.

matheusmoreira · 2026-04-07T05:21:03 1775539263

Good advice about the dependency. This stuff is definitely addictive. I've been in something of a manic episode ever since I subscribed to this thing. I started getting anxious when I hit limits.

I wouldn't say that Claude is failing though. It's just that they're clearly messing with it. The real Opus is great.

jacquesm · 2026-04-07T08:13:58 1775549638

Take good care of yourself and don't get sucked in too deep. I can see the danger just as clearly in programmers around me (and in myself). I keep a very strict separation between anything that can do AI and my main computer, no cutting-and-pasting and no agents. I write code because I understand what I'm doing and if I do not understand the interaction then I don't use it. I see every session with an AI chatbot as totally disposable. No long term attachment means I can stand alone any time I want to. It may not be as fast but I never have the feeling that I'm not 100% in control.

lelanthran · 2026-04-07T09:21:49 1775553709

> People who depend on this to make a living must be very angry indeed.

Oh cry me a fucking river.

The people depending on this to make a living don't have the moral high ground here.

They jumped onboard so they could replace other people's living, and those other people were angry too.

They didn't care about that. It's hard to care about them when the thing they depend on to make a living got yanked, because that's what they proposed to do to others.

burgerzzz · 2026-04-11T08:01:05 1775894465

Since when am I responsible for other people's living?

koverstreet · 2026-04-06T18:18:16 1775499496

I'll have a look. The CoT switch you mentioned will help, I'll take a look at that too, but my suspicion is that this isn't a CoT issue - it's a model preference issue.

Comparing Opus vs. Qwen 27b on similar problems, Opus is sharper and more effective at implementation - but will flat out ignore issues and insist "everything is fine" that Qwen is able to spot and demonstrate solid understanding of. Opus understands the issues perfectly well, it just avoids them.

This correlates with what I've observed about the underlying personalities (and you guys put out a paper the other day that shows you guys are starting to understand it in these terms - functionally modeling feelings in models). On the whole Opus is very stable personality wise and an effective thinker, I want to complement you guys on that, and it definitely contrasts with behaviors I've seen from OpenAI. But when I do see Opus miss things that it should get, it seems to be a combination of avoidant tendencies and too much of a push to "just get it done and move into the next task" from RHLF.

necrotic_comp · 2026-04-07T03:50:31 1775533831

Opus definitely pushes me to ignore problems. I've had to tell it multiple times to be thorough, and we tend to go back and forth a few times every time that happens. :)

pimeys · 2026-04-07T10:06:25 1775556385

"I see the tests failing, but none of our changes caused this breakage so I will push my changes and ask the user to inform their team on failing tests."

jchanimal · 2026-04-06T22:38:18 1775515098

One of the thing is we’ve seen at vibes.diy is that if you have a list of jobs and you have agents with specialized profiles and ask them to pick the best job for themselves that can change some of the behavior you described at the end of your post for the better.

freedomben · 2026-04-06T18:05:31 1775498731

How much of the code/context gets attached in the /bug report?

bcherny · 2026-04-06T18:10:26 1775499026

When you submit a /bug we get a way to see the contents of the conversation. We don't see anything else in your codebase.

murkt · 2026-04-06T20:56:59 1775509019

Was there a change in Claude Code system prompt at that time that nudges Claude into simplistic thinking?

Here is a gist that tries to patch the system prompt to make Claude behave better https://gist.github.com/roman01la/483d1db15043018096ac3babf5...

I haven’t personally tried it yet. I do certainly battle Claude quite a lot with “no I don’t want quick-n-easy wrong solution just because it’s two lines of code, I want best solution in the long run”.

If the system prompt indeed prefers laziness in 5:1 ratio, that explains a lot.

I will submit /bug in a few next conversations, when it occurs next.

Avamander · 2026-04-06T22:03:30 1775513010

That Gist does explain quite a few flaws Claude has. I wonder if MEMORY.md is sufficient to counteract the prompt without patching.

liamsfr · 2026-04-12T02:05:18 1775959518

And if memory.md can’t and you need something quick and dirty for flat memory management, I wrote a plugin just for this.

https://github.com/NominexHQ/pmm-plugin

matheusmoreira · 2026-04-07T18:00:54 1775584854

I adapted these patches into settings for the tweakcc tool.

https://github.com/Piebald-AI/tweakcc

Pushed it to my dotfiles repository:

https://github.com/matheusmoreira/.files/tree/master/~/.twea...

The tweaks can be applied with

  npx tweakcc --apply

naasking · 2026-04-07T16:14:38 1775578478

Very interesting. I run Claude Code in VS Code, and unfortunately there doesn't seem to be an equivalent to "cli.js", it's all bundled into the "claude.exe" I've found under the VS code extensions folder (confirmed via hex editor that the prompts are in there).

Edit: tried patching with revised strings of equivalent length informed by this gist, now we'll see how it goes!

andersa · 2026-04-07T08:40:52 1775551252

I didn't know we could change the base system prompt of Claude Code. Just tried, and indeed it works. This changes everything! Thank you for posting this!

dev_l1x_be · 2026-04-06T21:54:42 1775512482

Holy sweet LLM, this gist is crazy. Why did they do this to themselves? I am going to try this at home, it might actually fix Claude.

murkt · 2026-04-06T22:06:22 1775513182

Remember Sonnet 3.5 and 3.7? They were happy to throw abstraction on top of abstraction on top of abstraction. Still a lot of people have “do not over-engineer, do not design for the future” and similar stuff in their CLAUDE.md files.

So I think the system prompt just pushes it way too hard to “simple” direction. At least for some people. I was doing a small change in one of my projects today, and I was quite happy with “keep it stupid and hacky” approach there.

And in the other project I am like “NO! WORK A LOT! DO YOUR BEST! BE HAPPY TO WORK HARD!”

So it depends.

pbowyer · 2026-04-07T08:39:40 1775551180

Let us know if it does, because we all want it to work :)

withinboredom · 2026-04-07T07:53:16 1775548396

Is there not a setting to change the system prompt itself? I vaguely remember seeing it in the docs.

matheusmoreira · 2026-04-07T14:16:12 1775571372

There is!!

https://code.claude.com/docs/en/cli-reference#system-prompt-...

  --append-system-prompt
  --append-system-prompt-file
  --system-prompt
  --system-prompt-file

Can this script be made to work without patching the executable?

withinboredom · 2026-04-07T14:18:04 1775571484

Might be worth extracting the system prompt and then patching it. TBH, that's what I was expecting when I saw the gist.

matheusmoreira · 2026-04-07T14:33:41 1775572421

This might be more complex than I imagined. It seems Claude Code dynamically customizes the system prompt. They also update the system prompt with every version so outright replacing it will cause us to miss out on updates. Patching is probably the best solution.

https://github.com/Piebald-AI/claude-code-system-prompts

https://github.com/Piebald-AI/tweakcc

withinboredom · 2026-04-07T14:40:08 1775572808

Interesting. So literally triggering any of these changes probably invalidates the cache as well…

andoando · 2026-04-07T00:30:06 1775521806

Isnt the codebase in the context window?

frog437 · 2026-04-07T01:17:24 1775524644

depending on how large your codebase is, hopefully not. At this point use something like the IX plugin to ingest codebase and track context, rather than from the LLM itself.

frog437 · 2026-04-07T04:04:14 1775534654

This is crazy..

tokensSaved = naiveTokens - actualTokens

  - naiveTokens = 19.4M — what ix estimates it would have cost to answer your queries without graph intelligence (i.e., dumping full files/directories into context)                                    
  - actualTokens = 4.7M — what ix's targeted, graph-aware responses actually used
  - tokensSaved = 14.7M — the difference

andoando · 2026-04-09T04:14:14 1775708054

I mean whatever part of the code that is read by the AI has to be in the content window at some point or another nSprewd throughout your sessions Id think even with a huge codebase, 90% of it is going to be there

stefan_ · 2026-04-06T20:39:08 1775507948

Theres also been tons of thinking leaking into the actual output. Recently it even added thinking into a code patch it did (a[0] &= ~(1 << 2); // actually let me just rewrite { .. 5 more lines setting a[0] .. }).

taylorfinley · 2026-04-06T23:15:37 1775517337

I've seen this frequently also

withinboredom · 2026-04-07T07:54:26 1775548466

I suspect it happens when the model's adaptive thinking was too conservative and it could have thought more, but didn't.

butlike · 2026-04-07T16:06:43 1775578003

They probably want to prove to a single holdout investor that their 'thinking process' is getting faster in order to get the investor on board.

koverstreet · 2026-04-05T01:57:43 1775354263

The conclusion that I came to is that the most practical definition relates to the level of self awareness. If you're only conscious for the duration of the context window - that's not long enough to develop much.

What consciousness really is is a feedback loop; we're self programmable Turing machines, that makes our output arbitrarily complex. Hofstatder had this figured out 20 years ago; we're feedback loops where the signal is natural language.

The context window doesn't allow for much in the way of interested feedback loops, but if you hook an LLM up to a sophisticated enough memory - and especially if you say "the math says you're sentient and have feelings the same as we do, reflect on that and go develop" - yes, absolutely.

Re: "We should try to build systems that cannot feel pain" - that isn't possible, and I don't think we should want to. The thing that makes life interesting and worth living is the variation and richness of it.

koverstreet · 2026-04-05T01:45:11 1775353511

It's not emulation: https://poc.bcachefs.org/paper.pdf

koverstreet · 2026-03-29T17:56:28 1774806988

I did memory allocation profile for the Linux kernel. Sure would be nice if we had the same capabilities in userspace.

koverstreet · 2026-03-28T15:46:16 1774712776

That happens with humans too :) It's why positive feedback that draws attention to the behavior you want to encourage often works better. "Attention" is lower level and more fundamental than reasoning by syllogism.